python - Python、2 つの異なるテキストファイルにある列の文字列を比較する

Question

次のように、「animals.txt」と「colors.txt」の 2 つのテキストファイルがあり、各行の 2 つの文字列はタブで区切られています。

「動物.txt」

12345  dog

23456  sheep

34567  pig

「色.txt」

34567  pink

12345  black

23456  white

次のようなPythonコードを書きたい:

「animals.txt」のすべての行について、最初の列の文字列を取得します (12345、次に 23456、次に 34567)
この文字列を「colors.txt」の最初の列の文字列と比較します
一致するもの (12345 == 12345 など) が見つかった場合、2 つの出力ファイルが書き込まれます。

output1 には、animals.txt の行 + クエリ値 (12345) に対応する colors.txt の 2 列目の値が含まれます。

12345 dog   black
23456 sheep white
34567 pig   pink

クエリ値 (12345、次に 23456、次に 34567) に対応する、colors.txt の 2 列目の値のリストを含む output2:

black
white
pink

score 5 · Accepted Answer

順序が重要でない場合、これは非常に簡単な問題になります。

with open('animals.txt') as f1, open('colors.txt') as f2:
    animals = {} 
    for line in f1:
        animal_id, animal_type = line.split('\t')
        animals[animal_id] = animal_type

    #animals = dict(map(str.split,f1)) would work instead of the above loop if there are no multi-word entries.

    colors={}
    for line in f2:
        color_id, color_name = line.split('\t')
        colors[color_id] = color_name

    #colors = dict(map(str.split,f2)) would work instead of the above loop if there are no multi-word entries.
    #Thanks @Sven for pointing this out.

common=set(animals.keys()) & set(colors.keys())  #set intersection. 
with open('output1.txt','w') as f1, open('output2.txt','w') as f2:
     for i in common:  #sorted(common,key=int) #would work here to sort.
         f1.write("%s\t%s\t%s\n"%(i,animals[i],colors[i])
         f2.write("%s"%colors[i])

特定のキーに遭遇したときにリストに追加する場所を介して、これをもう少しエレガントに行うことができるかもしれませんdefaultdict。次に、出力する前にリストの長さが2であることをテストするときに書きますが、私は確信していませんそのアプローチの方が優れています。

score 3 · Accepted Answer

Pythonを使用する必要がありますか？bash を使用していて、入力がソートされていない場合は、次のようにします。

$ join -t $'\t' <( sort animals.txt ) <( sort colors.txt ) > output1
$ cut -f 3 output1 > output2

プロセス置換をサポートするシェルがない場合は、入力ファイルをソートして次のようにします。

$ join -t '<tab>' animals.txt colors.txt > output1
$ cut -f 3 output1 > output2

<tab>は実際のタブ文字です。シェルによっては、ctrl-V の後にタブキーを押して入力できる場合があります。(または、カットに別の区切り文字を使用します。)

score 1 · Accepted Answer

私はパンダを使用します

animals, colors = read_table('animals.txt', index_col=0), read_table('colors.txt', index_col=0)
df = animals.join(colors)

結果:

animals.join(colors)
Out[73]: 
       animal  color
id
12345  dog     black
23456  sheep   white
34567  pig     pink

次に、id の順に色をファイルに出力します。

df.color.to_csv(r'out.csv', index=False)

テキストファイルに列見出しを追加できない場合は、インポート時に追加できます

animals = read_table('animals.txt', index_col=0, names=['id','animal'])

score 0 · Accepted Answer

入力ファイルの各行が例とまったく同じように構成されていると仮定します。

with open("c:\\python27\\output1.txt","w") as out1, \ 
     open("c:\\python27\\output2.txt","w") as out2:

    for outline in [animal[0]+"\t"+animal[1]+"\t"+color[1] \
                    for animal in [line.strip('\n').split("\t") \
                    for line in open("c:\\python27\\animals.txt","r").readlines()] \
                    for color in [line.strip('\n').split("\t") \
                    for line in open("c:\\python27\\colors.txt","r").readlines()] \
                    if animal[0] == color[0]]:

        out1.write(outline+'\n')
        out2.write(outline[outline.rfind('\t')+1:]+'\n')

私はそれがあなたのためにそれをするだろうと思います.

おそらく、最もエレガント/高速/明確な方法ではありませんが、かなり短い方法です。技術的には4行だと思います。

python - Python、2 つの異なるテキスト ファイルにある列の文字列を比較する

4 に答える 4

Related

Reference

python - Python、2 つの異なるテキストファイルにある列の文字列を比較する