python - テキストファイルのリスト

Question

助けが必要です。

リストの行を含むテキストファイルがあります。各行はアイテムのリストを表します。頻度が>=2のすべてのアイテムを抽出し、それらを別のファイルに出力する必要があります。例を次に示します。

['COLG-CAD-406', 'CSAL-CAD-030', 'COLG-CAD-533', 'COLG-CAD-188']

['COLG-CAD-188']

['CSAL-CAD-030']

['EPHAG-JAE-004']

['COLG-CAD-188', 'CEM-SEV-004']

['COL-CAD-188', 'COLG-CAD-406']

出力は次のようになります

['COLG-CAD-406'], 2

['CSAL-CAD-030'], 2

['COLG-CAD-188'], 3

など、ファイルの最後まで続きます

よろしくお願いします。

score 2 · Accepted Answer

どうですか：

for x in f.readlines():
    words = ast.literal_eval(x)
    count = {}
    for w in words:        
        count[w] = count.get(w, 0) + 1
    for word, freq in count.iteritems():
        if freq >= 2:
            print word, freq

fあなたのファイルはどこにありますか

score 0 · Accepted Answer

Python 2.7以降を使用している場合、この入力（と呼ばれるlist1.txt）を使用します。

['COLG-CAD-406', 'CSAL-CAD-030', 'COLG-CAD-533', 'COLG-CAD-188']
['COLG-CAD-188']
['CSAL-CAD-030']
['EPHAG-JAE-004']
['COLG-CAD-188', 'CEM-SEV-004']
['COLG-CAD-188', 'COLG-CAD-406']

そしてこのPythonプログラム：

from collections import Counter
import ast

cnt = Counter()

with open("list1.txt") as lfile:
    for line in lfile:
        # eval() could lead to python code injection so use literal_eval
        # the result is a list that we can directly use to update cnt keys
        cnt.update(ast.literal_eval(line))

for k, v in iter(cnt.items()):
    if v>=2:
        print("%s: %d"%  (k, v))

あなたはあなたが望むものを手に入れます：

CSAL-CAD-030: 2
COLG-CAD-406: 2
COLG-CAD-188: 4

score 0 · Accepted Answer

これは、正規表現を使用して、必要なことを正確に実行する完全なスクリプトです。

from collections import defaultdict
import re

myarch = 'C:/code/test5.txt'   #this is your archive
mydict = defaultdict(int)

with open(myarch) as f:
    for line in f:
        codes = re.findall("\'(\S*)\'", line)
        for key in codes:
            mydict[key] +=1

out = []
for key, value in mydict.iteritems():
    if value > 1:
        text = "['%s'], %s" % (key, value)
        out.append(text)

#save to a file
with open('C:/code/fileout.txt', 'w') as fo:
    fo.write('\n'.join(out))

これは次のように簡略化できます。

from collections import defaultdict
import re

myarch = 'C:/code/test5.txt'
mydict = defaultdict(int)

with open(myarch) as f:
    for line in f:
        for key in re.findall("\'(\S*)\'", line):
            mydict[key] +=1

out = ["['%s'], %s" % (key, value) for key, value in mydict.iteritems() if value > 1]

#save to a file
with open('C:/code/fileout.txt', 'w') as fo:
    fo.write('\n'.join(out))

score 0 · Accepted Answer

入力：

['COLG-CAD-406', 'CSAL-CAD-030', 'COLG-CAD-533', 'COLG-CAD-188']

['COLG-CAD-188']

['CSAL-CAD-030']

['EPHAG-JAE-004']

['COLG-CAD-188', 'CEM-SEV-004']

['COL-CAD-188', 'COLG-CAD-406']

出力

>>> from collections import Counter
>>> from ast import literal_eval
>>> with open('input.txt') as f:
        c = Counter(word for line in f if line.strip() for word in literal_eval(line))


>>> print '\n'.join('{0}, {1}'.format([word],freq) for word,freq in c.iteritems() if freq >= 2)
['CSAL-CAD-030'], 2
['COLG-CAD-406'], 2
['COLG-CAD-188'], 3

python - テキストファイルのリスト

4 に答える 4

Related

Reference