python - 2 つの列のデータを結合して一意の識別子がいくつあるか数えますか?

Question

私は、defaultdict を使用していると思われる非常に単純なカウントスクリプトを作成しようとしています (DefaultDict の使用方法について理解できないので、誰かが私にコードのスニペットをコメントしてくれれば、非常に感謝します)

私の目的は、要素 0 と要素 1 を取得し、それらを単一の文字列にマージしてから、一意の文字列がいくつあるかを数えることです...

たとえば、以下のデータには、3 つのクラス、4 つの classid で構成される 15 行があり、それらをマージすると、3 つの一意のクラスしかありません。最初の行のマージされたデータ (タイトル行を無視) は次のとおりです。Class01CD2

CSV データ:

uniq1,uniq2,three,four,five,six
Class01,CD2,data,data,data,data
Class01,CD2,data,data,data,data
Class01,CD2,data,data,data,data
Class01,CD2,data,data,data,data
Class02,CD3,data,data,data,data
Class02,CD3,data,data,data,data
Class02,CD3,data,data,data,data
Class02,CD3,data,data,data,data
Class02,CD3,data,data,data,data
Class02,CD3,data,data,data,data
Class02,CD3,data,data,data,data
DClass2,DE2,data,data,data,data
DClass2,DE2,data,data,data,data
Class02,CD1,data,data,data,data
Class02,CD1,data,data,data,data

そのアイデアは、利用可能な一意のクラスの数を単純に出力することです。これを解決するのを手伝ってくれる人はいますか?

よろしく
- ハイフレックス

score 1 · Accepted Answer

CSV データを扱っているため、辞書と一緒に CSV モジュールを使用できます。

import csv

uniq = {} #Create an empty dictionary, which we will use as a hashmap as Python dictionaries support key-value pairs.

ifile = open('data.csv', 'r') #whatever your CSV file is named.
reader = csv.reader(ifile)

for row in reader:
    joined = row[0] + row[1] #The joined string is simply the first and second columns in each row.
    #Check to see that the key exists, if it does increment the occurrence by 1
    if joined in uniq.keys():
        uniq[joined] += 1
    else:
        uniq[joined] = 1 #This means the key doesn't exist, so add the key to the dictionary with an occurrence of 1

print uniq #Now output the results

これは以下を出力します:

{'Class02CD3': 7, 'Class02CD1': 2, 'Class01CD2': 3, 'DClass2DE2': 2}

注: これは、CSV にヘッダー行 ( ) がないことを前提としていuniq1,uniq2,three,four,five,sixます。

参考文献:

http://docs.python.org/2/library/stdtypes.html#dict

python - 2 つの列のデータを結合して一意の識別子がいくつあるか数えますか?

CSV データ:

1 に答える 1

Related

Reference