python - NLTK による度数分布の集計と出力

翻译自：https://stackoverflow.com/questions/26183357 2014-10-03T16:58:33.343

1763 次

NLTK で 12,000 個のテキストファイルのコーパス全体でトライグラムを集計し、各トライグラムの頻度分布をファイルに出力しようとしていますが、次のエラーが発生します。

Traceback (most recent call last):
  File "TPNngrams2.py", line 19, in <module>
    fdisttab = fdist.tabulate()
  File "/Library/Python/2.7/site-packages/nltk/probability.py", line 281, in tabulate
     print("%4s" % samples[i], end=' ')
TypeError: not all arguments converted during string formatting

コードは次のとおりです。

import nltk
import re
from nltk.corpus.reader.plaintext import PlaintextCorpusReader
from nltk import FreqDist

#this imports the text files in the folder into corpus called speeches
corpus_root = '/Users/root'
speeches = PlaintextCorpusReader(corpus_root, '.*\.txt')

print "Finished importing corpus"
fdist = nltk.FreqDist()  # Empty distribution

for filename in speeches.fileids():
    (str(trigram) for trigram in nltk.trigrams(speeches.words(filename)))
    fdist.update(nltk.trigrams(speeches.words(filename)))

fdisttab = fdist.tabulate()
print fdisttab
f = open('freqdists.txt', 'w+')
f.write(fdisttab)
f.close()

print "All done. Check file."

よろしくお願いいたします。これに取り組み始める方法がわかりません。

python - NLTK による度数分布の集計と出力

0 に答える 0

Related

Reference