csv - Pythonを使用した2つの単語に基づく単語頻度カウント

Question

this やthisやthisなどの 1 つの単語の単語数をカウントする方法を示すオンラインのリソースはたくさんありますが
、2 つの単語のカウント頻度の具体的な例を見つけることができませんでした。

いくつかの文字列を含む csv ファイルがあります。

FileList = "I love TV show makes me happy, I love also comedy show makes me feel like flying"

だから私は出力を次のようにしたい：

wordscount =  {"I love": 2, "show makes": 2, "makes me" : 2 }

もちろん、すべてのコンマ、尋問ポイントを削除する必要があります....{!, , ", ', ?, ., (,), [, ], ^, %, #, @, &, *, -, _, ;, /, \, |, }

テキストからより具体的なデータを取得するために、ここで見つけたいくつかのストップワードも削除します。

Pythonを使用してこの結果を達成するにはどうすればよいですか?

ありがとう！

score 3 · Accepted Answer

>>> from collections import Counter
>>> import re
>>> 
>>> sentence = "I love TV show makes me happy, I love also comedy show makes me feel like flying"
>>> words = re.findall(r'\w+', sentence)
>>> two_words = [' '.join(ws) for ws in zip(words, words[1:])]
>>> wordscount = {w:f for w, f in Counter(two_words).most_common() if f > 1}
>>> wordscount
{'show makes': 2, 'makes me': 2, 'I love': 2}

csv - Pythonを使用した2つの単語に基づく単語頻度カウント

1 に答える 1

Related

Reference