python - Python - 文字列の頻度を決定し、さらに処理する

Question

\t(タブ)で区切られた可変列番号を含むテキストファイルがいくつかあります。このようなもの：

value1x1 . . . . . . value1xn
   .     . . . . . . value2xn
   .     . . . . . .     .
valuemx1 . . . . . . valuemxn

次のコードを使用して、値の頻度をスキャンして判断できます。

f2 = open("out_freq.txt", 'w')
f = open("input_raw",'r')
whole_content = (f.read())
list_content = whole_content.split()
dict = {}
for one_word in list_content:
    dict[one_word] = 0
for one_word in list_content:
    dict[one_word] += 1
a = str(sorted(dict.items(),key=func))
f2.write(a)
f2.close()

これの出力は次のとおりです。

('26047', 13), ('42810', 13), ('61080', 13), ('106395', 13), ('102395', 13)...

これの構文はで('value', occurence_number)あり、期待どおりに機能します。私が達成しようとしていることは次のとおりです。

出力を次の構文に変換するには:('value', occurrence_number, column_number)ここで、列番号は、input_raw.txt でこの値が発生した列番号です。
同じ出現番号を持つ値をグループ化して列を分離し、これらを別のファイルに書き込むには

score 0 · Accepted Answer

私が理解している場合は、次のようなものが必要です。

import itertools as it
from collections import Counter

with open("input_raw",'r') as fin, open("out_freq.txt", 'w') as fout:
    counts = Counter(it.chain.from_iterable(enumerate(line.split())
                                                  for line in fin))
    sorted_items = sorted(counts.items(), key=lambda x: x[1], reverse=True)
    a = ', '.join(str((int(key[1]), val, key[0])) for key, val in sorted_items))
    fout.write(a)

このコードでは、タプルをキーとして使用して、異なる列に表示される場合に等しい値を区別することに注意してください。あなたの質問から、これが可能かどうか、またこの場合に何をすべきかが明確ではありません。

使用例:

>>> import itertools as it
>>> from collections import Counter
>>> def get_sorted_items(fileobj):
...     counts = Counter(it.chain.from_iterable(enumerate(line.split()) for line in fileobj))
...     return sorted(counts.items(), key=lambda x:x[1], reverse=True)
... 
>>> data = """
... 10 11 12 13 14
... 10 9  7  6  4
... 9  8  12 13 0
... 10 21 33 6  1
... 9  9  7  13 14
... 1  21 7  13 0
... """
>>> with open('input.txt', 'wt') as fin:  #write data to the input file
...     fin.write(data)
... 
>>> with open('input.txt', 'rt') as fin:
...     print ', '.join(str((int(key[1]), val, key[0])) for key, val in get_sorted_items(fin))
... 
(13, 4, 3), (10, 3, 0), (7, 3, 2), (14, 2, 4), (6, 2, 3), (9, 2, 0), (0, 2, 4), (9, 2, 1), (21, 2, 1), (12, 2, 2), (8, 1, 1), (1, 1, 4), (1, 1, 0), (33, 1, 2), (4, 1, 4), (11, 1, 1)

python - Python - 文字列の頻度を決定し、さらに処理する

1 に答える 1

Related

Reference