python - 先行する文字がわかっている場合、Python で文字の出現をカウントする

Question

前の文字がわかっている場合、文字の出現をカウントするコードがあります。
これは私が試したものですが、うまくいきません。

このファイルには、「K」、「L」、「G」、「A」、「S」、「」という文字の単語のみが含まれています。

text = open("fichier_a_compresser 1.txt", 'r')
alphabet = ("K", "L", "G", "A", "S", " ")
for i in text:
    characterlist  = list(i)

j = 0
cont = 0
for i in alphabet:
    for k in alphabet:
        while j < len(characterlist):
            if (characterlist[j-1]==k and characterlist[j]==i):
                cont = cont + 1
            j = j + 1 
        print str(i) + " appears after the character " + str(k) + " " + str(cont) + " times."
        cont = 0

出口は常に0なので、「続き」の部分が間違っていると思います。よろしくお願いし
ます

score 1 · Accepted Answer

次のコード：

for i in text:
    characterlist = list(i)

おそらくあなたが思っていることをしないでしょう。ファイルの各行に1つずつ文字リストを割り当てます。ループが終了すると、ファイルの最後の行があり、他のすべての行が破棄されています。最後の行だけで作業するつもりだったとしても、それをリストに変換する必要はありません。これが背後にある意図だと思いlist(i)ます。文字列はすでにリストのように動作します。

アルゴリズム自体に関しては、私はそれに従うのに苦労しています。私はこれがあなたが望むものに近いかもしれないと思います：

freqs = [ (a, b, len(line.split(a + b)) - 1) for a in alphabet for b in alphabet ]
for (a, b, f) in freqs:
    print '{} appears after {} {} times.'.format(a, b, f)

ここで、lineは分析するテキストを含む文字列です。

score 0 · Accepted Answer

collections モジュールから Python の優れたデータ構造を使用すると、作業が楽になります。

from collections import defaultdict, Counter

txt = open("fichier_a_compresser 1.txt").read()

counts = defaultdict(Counter)

for i in range(len(txt)-1):
    counts[txt[i]][txt[i+1]]+=1

for first, counter in counts.items():
    for second, count in counter.items():
        print '{} appears after the character {} {} times.'.format(second, first, count)

python - 先行する文字がわかっている場合、Python で文字の出現をカウントする

2 に答える 2

Related

Reference