python - Python: 特定の範囲の数値データを抽出する

Question

私はPythonの初心者で、大きなデータセット内の特定のサイズの数を数えようとしています. 元のデータは、タブで区切られたテキストファイルにあります。さまざまな動物の「名前」（文字列ですが、各行はリストのように見えます）と「サイズ」（整数）が別の行にあります。10 から 30 の間の特定のサイズ範囲に該当するすべての動物の数を数えたいと思います。

これまでのところ、各「名前」の数を数えることに成功しましたが、「サイズ」の指定に失敗しました。私が持っているコードは以下のとおりです。エラーは発生しますが、無視されます。コードが無視されている理由を教えてください。事前に助けてくれてありがとう！

import csv, collections

reader=csv.reader(open('C:\Users\Owl\Desktop\Data.txt','rb'), delimiter='\t')
counts=collections.Counter()

for line in reader:
   Name=line[1]
   Size=line[10]
   counts[Name]+=1

for (Name, count) in counts.iteritems():
   if 10<=Size<=30:
      print '%s: %s' % (Name, count)

score 3 · Accepted Answer

書かれSizeているように、ファイルの最後のサイズ値に永続的に設定され、Name.

for ループの各ラウンドSizeはに設定されline[10]ますが、ループの範囲外には格納されません。Nameカウンターに間接的に格納されます。そのため、次にループが実行されると、の値がSize次の動物のサイズに変更されます。

各動物はデータに複数回出現しますか?

もう少し複雑なデータ構造が必要になるか、ファイルをループしながらサイズを確認する必要があります。

サイズ範囲外の動物を無視しても構わない場合:

for line in reader:
    size = float(line[10])
    if 10 <= size <= 30:
        name = line[1]
        counts[name] += 1

for name, count in counts.iteritems():
    print '%s: %s' % name, count

(注: Python の推奨スタイルガイド pep8 に合わせて、元のコードの大文字と小文字を変更しました。)

score 2 · Accepted Answer

Size=line[10]

makes Size a string.

10<=Size<=30

compares ints with a string (Size).

In [3]: 10 <= '20' <= 30
Out[3]: False

To fix this use:

try:
    Size = float(line[10])
except (ValueError, IndexError):
    continue

The try...except above will cause your program to skip lines in your csv file that either does not have an 11th column or has a string there which can not be converted to a float.

In Python2, ints compare less than strings.

In [4]: 10 <= '1'
Out[4]: True

(Believe it or not, because i as in int comes before s as in string in the alphabet...)

In Python3, a TypeError is raised.

Python 3.2.2 (default, Sep  5 2011, 22:09:30) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 10 <= '1'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() <= str()

Hallelujah.

score 1 · Accepted Answer

Python の優れた機能の 1 つは、辞書内のキーが次のような非常に高度なものになる可能性があることです。JF Sebastianが指摘したように、タプル(または日付、または多くのもの... ハッシュ可能である限り) -ここでは、ハッシュに違法なものは何もありません-)。それを正規表現と組み合わせると、かなり派手な「サイズ分類子」ができます :-) :

sizesFromFile = [
    "Name: Cat, Size: 3.2",
    "Name: Dog, Size: 4.2",
    "Name: BigFoot, Size: 12",
    "Name: Elephant, Size: 31.4",
    "Name: Whale, Size: 85.99",
]

import re
import sys
regex = re.compile(r"^Name:\s*(?P<name>\w+),\s+Size:\s+(?P<size>[\d\.]+)")

myRanges = {
    (0, 10): list(),
    (11, 20): list(),
    (21, 30): list(),
    (31, sys.maxint): list()
}

for line in sizesFromFile:
    match = regex.match(line)
    if match is not None:
        print "Success parsing %s, %s" % (match.groupdict()["name"], match.groupdict()["size"])
        name = match.groupdict()["name"]
        size = float(match.groupdict()["size"])
        for myRange in myRanges:
            if size >= myRange[0] and size <= myRange[1]:
                myRanges[myRange].append(name)

print "This is what I got: %s" % (myRanges)

その出力：

This is what I got: {(21, 30): [], (11, 20): ['BigFoot'], (0, 10): ['Cat', 'Dog'], (31, 2147483647): ['Elephant', 'Whale']}

私はこれが非常に最適ではないことを確信していますが、話すのはスピードですが...それでもちょっとクールですよね？

python - Python: 特定の範囲の数値データを抽出する

3 に答える 3

Related

Reference