python - テキスト内の辞書のキーの頻度を計算する

Question

私は言葉の口述を持っています。dict の各キーについて、記事での頻度を見つけたいと思います。

記事を開いた後、

for k, v in sourted_key.items():
    for token in re.findall(k, data)
        token[form] += 1

「re.findall(k, data)」のキーは文字列でなければなりません。しかし、この dict のキーはそうではありません。キーを検索したい。他の解決策はありますか？KEYS には多くの PUNCTUATIONS が含まれていることに注意してください。

たとえば、キーが「手」の場合。手だけにマッチします。ハンディじゃないよ、チャンドラー。

score 6 · Accepted Answer

Python 2.7+ では、次のように使用できますcollections.Counter。

import re, collections

text = '''Nullam euismod magna et ipsum tristique suscipit. Aliquam ipsum libero, cursus et rutrum ut, suscipit id enim. Maecenas vel justo dolor. Integer id purus ante. Aliquam volutpat iaculis consectetur. Suspendisse justo sapien, tincidunt ut consequat eget, fringilla id sapien. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Praesent mattis velit vitae libero luctus posuere. Vestibulum ac erat nibh, vel egestas enim. Ut ac eros ipsum, ut mattis justo. Praesent dignissim odio vitae nisl hendrerit sodales. In non felis leo, vehicula aliquam risus. Morbi condimentum nunc sit amet enim rutrum a gravida lacus pharetra. Ut eu nisi et magna hendrerit pharetra placerat vel turpis. Curabitur nec nunc et augue tristique semper.'''

c = collections.Counter(w.lower() for w in re.findall(r'\w+|[.,:;?!]', text))
words = set(('et', 'ipsum', ',', '?'))
for w in words:
  print('%s: %d' % (w, c.get(w, 0)))

score 2 · Accepted Answer

my_text = 'abc,abc,efr,sdgret,er,ttt,'

my_dict = {'abc':0, 'er': 0}

for word in my_text.split(','):
    if word in my_dict:
        my_dict[word] += 1

結果：

>>> my_dict
{'abc': 2, 'er': 1}

編集:より一般的な解決策

通常の記事では、正規表現を使用する必要があります:

import re

my_string = "Wow! Is this true? Really!?!? This is crazy!"
my_dict = {'IS': 0, 'TRUE': 0}

words = re.findall(r'\w+', my_string)
cap_words = [word.upper() for word in words]

for word in cap_words:
    if word in my_dict:
        my_dict[word] += 1

結果：

>>> my_dict
{'IS': 2, 'TRUE': 1}

score 2 · Accepted Answer

私はそのようにします

tokens = {} 
d= {"a":1,"b":2}
data = "abca"
for k in d.keys():
    tokens[k] = data.count(k)

score 1 · Accepted Answer

オプション A

import re

text = """Now is the time for all good men to come to the aid of their country.  A man is only as good as all his thoughts."""
words = dict()

for word in re.findall('[^ .;]+', text):
    if words.get(word.lower(), False):
        words[word.lower()] += 1
    else:
        words[word.lower()] = 1

print words

これにより...

{'a': 1, 'all': 2, 'good': 2, 'for': 1, 'their': 1, 'of': 1, 
'is': 2, 'men': 1, 'as': 2, 'country': 1, 'to': 2, 'only': 1, 
'his': 1, 'time': 1, 'aid': 1, 'the': 2, 'now': 1, 'come': 1, 
'thoughts': 1, 'man': 1}

オプション B: defaultdict を使用

import re
from collections import defaultdict

text = """Now is the time for all good men to come to the aid of their country.  A man is only as good as all his thoughts."""
words = defaultdict(int)

for word in re.findall('[^ .;]+', text):
    words[word.lower()] += 1

print words

これにより...

defaultdict(<type 'int'>, {'a': 1, 'all': 2, 'good': 2, 'for': 1, 
'their': 1, 'of': 1, 'is': 2, 'men': 1, 'as': 2, 'country': 1, 'to': 2, 
'only': 1, 'his': 1, 'time': 1, 'aid': 1, 'the': 2, 'now': 1, 'come': 1, 
'thoughts': 1, 'man': 1})

score 1 · Accepted Answer

re.findall( re.escape( k ), data )「単語」内の特殊文字が問題を引き起こさないようにしてください。

しかし、これはあなたの問題ではないと思います。の結果findall()は、文字列ではなく、一致のリストです。どの手段がre.MatchObject機能しないかを定義していません。__getitem__[form]

おそらく、counts[token.group()] += 1where countsis はデフォルト値 0 の辞書を意味していました。

score 0 · Accepted Answer

みんなスイングしてるから…

これとの違いは、テキストを句読点から分離するための正規表現です。私が使う\b\w+\b

import re 

article='''Richard II (13671400) was King of England, a member of the House of Plantagenet and the last of its main-line kings. He ruled from 1377 until he was deposed in 1399. Richard was a son of Edward, the Black Prince, and was born during the reign of his grandfather, Edward III. Richard was tall, good-looking and intelligent. Although probably not insane, as earlier historians believed, he may have suffered from one or several personality disorders that may have become more apparent toward the end of his reign. Less of a warrior than either his father or grandfather, he sought to bring an end to the Hundred Years' War that Edward III had started. He was a firm believer in the royal prerogative, which led him to restrain the power of his nobility and rely on a private retinue for military protection instead. He also cultivated a courtly atmosphere where the king was an elevated figure, and art and culture were at the centre, in contrast to the fraternal, martial court of his grandfather. Richard's posthumous reputation has to a large extent been shaped by Shakespeare, whose play Richard II portrays Richard's misrule and Bolingbroke's deposition as responsible for the 15th-century Wars of the Roses. Most authorities agree that the way in which he carried his policies out was unacceptable to the political establishment, and this led to his downfall.'''
words = {}

for word in re.findall(r'\b\w+\b', article):
    word=word.lower()
    if word in words:
        words[word]+=1
    else:
        words[word]=1    

print [(k,v) for v, k in sorted(((v, k) for k, v in words.items()), reverse=True)]

頻度でソートされた (word, count) のタプルのリストを出力します。

[('the', 15), ('of', 11), ('was', 8), ('and', 8), ('to', 7), ('his', 7), ('he', 7), 
 ('a', 7), ('richard', 6), ('in', 4), ('that', 3), ('s', 3), ('grandfather', 3), 
 ('edward', 3), ('which', 2), ('reign', 2), ('or', 2), ('may', 2), ('led', 2), 
 ('king', 2), ('iii', 2), ('ii', 2), ('have', 2), ('from', 2), ('for', 2), ('end', 2), 
 ('as', 2), ('an', 2), ('years', 1), ('whose', 1), ('where', 1), ('were', 1), ('way', 1), ('wars', 1), ('warrior', 1), ('war', 1), ('until', 1), ('unacceptable', 1), ('toward', 1), ('this', 1), ('than', 1), ('tall', 1), ('suffered', 1), ('started', 1), ('sought', 1), ('son', 1), ('shaped', 1), ('shakespeare', 1), ('several', 1), ('ruled', 1), ('royal', 1), ('roses', 1), ('retinue', 1), ('restrain', 1), ('responsible', 1), ('reputation', 1), ('rely', 1), ('protection', 1), ('probably', 1), ('private', 1), ('prince', 1), ('prerogative', 1), ('power', 1), ('posthumous', 1), ('portrays', 1), ('political', 1), ('policies', 1), ('play', 1), ('plantagenet', 1), ('personality', 1), ('out', 1), ('one', 1), ('on', 1), ('not', 1), ('nobility', 1), ('most', 1), ('more', 1), ('misrule', 1), ('military', 1), ('member', 1), ('martial', 1), ('main', 1), ('looking', 1), ('line', 1), ('less', 1), ('last', 1), ('large', 1), ('kings', 1), ('its', 1), ('intelligent', 1), ('instead', 1), ('insane', 1), ('hundred', 1), ('house', 1), ('historians', 1), ('him', 1), ('has', 1), ('had', 1), ('good', 1), ('fraternal', 1), ('firm', 1), ('figure', 1), ('father', 1), ('extent', 1), ('establishment', 1), ('england', 1), ('elevated', 1), ('either', 1), ('earlier', 1), ('during', 1), ('downfall', 1), ('disorders', 1), ('deposition', 1), ('deposed', 1), ('culture', 1), ('cultivated', 1), ('courtly', 1), ('court', 1), ('contrast', 1), ('century', 1), ('centre', 1), ('carried', 1), ('by', 1), ('bring', 1), ('born', 1), ('bolingbroke', 1), ('black', 1), ('believer', 1), ('believed', 1), ('been', 1), ('become', 1), ('authorities', 1), ('atmosphere', 1), ('at', 1), ('art', 1), ('apparent', 1), ('although', 1), ('also', 1), ('agree', 1), ('15th', 1), ('1399', 1), ('1377', 1), ('13671400', 1)]

score 0 · Accepted Answer

article = "I have a dict of words. For each key in the dict, I want to find its frequency in an article"

words = {"dict", "i", "in", "key"} # set of words


wordsFreq = {}

wordsInArticle = tuple(word.lower() for word in atricle.split(" "))

for word in wordsInArticle:
  if word in wordsFreq:
    wordsFreq[word]= wordsFreq[word] + 1 if word in wordsFreq else 1

python - テキスト内の辞書のキーの頻度を計算する

7 に答える 7

オプション A

オプション B: defaultdict を使用

Related

Reference