python - Python と正規表現を使用してテキスト内の句読点を数える

Question

句読点が小説に登場する回数を数えようとしています。たとえば、クエスチョンマークとピリオドの出現を、英数字以外のすべての文字と共に見つけたいと考えています。次に、それらを csv ファイルに挿入します。Pythonの経験があまりないため、正規表現の方法がわかりません。誰かが私を助けることができますか？

texts=string.punctuation
counts=dict(Counter(w.lower() for w in re.findall(r"\w+", open(cwd+"/"+book).read())))
writer = csv.writer(open("author.csv", 'a'))
writer.writerow([counts.get(fieldname,0) for fieldname in texts])

score 7 · Accepted Answer

In [1]: from string import punctuation

In [2]: from collections import Counter

In [3]: counts = Counter(open('novel.txt').read())

In [4]: punctuation_counts = {k:v for k, v in counts.iteritems() if k in punctuation}

score 1 · Accepted Answer

import re
def count_puncts(x):
  # sub. punct. with '' and returns the new string with the no. of replacements.
  new_str, count = re.subn(r'\W', '', x)
  return count

score 0 · Accepted Answer

呪いの使用:

import curses.ascii
str1 = "real, and? or, and? what."
t = (c for c in str1 if curses.ascii.ispunct(c))
d = dict()
for p in t:
    d[p] = 1 if not p in d else d[p] + 1 for p in t

score 0 · Accepted Answer

あなたが持っているコードは、単語を数える場合に必要なものに非常に近いものです。単語を数えようとしている場合、必要な唯一の変更は、おそらく最後の行を次のように変更することです。

writer.writerows(counts.items())

残念ながら、ここで単語を数えようとしているわけではありません。1 文字のカウントを探している場合は、正規表現の使用を避けて、直接count. コードは次のようになります。

book_text = open(cwd+"/"+book).read()
counts = {}
for character in texts:
    counts[character] = book_text.count(character)
writer.writerows(counts.items())

お分かりかもしれませんが、これにより、文字をキーとし、その文字がテキストに出現する回数を値とする辞書が作成されます。次に、単語を数える場合と同じように書きます。

python - Python と正規表現を使用してテキスト内の句読点を数える

5 に答える 5

Related

Reference