python - spacy/nltk を使用してバイ/トライグラムを生成する方法

Question

入力テキストは、常に 1 ～ 3 個の形容詞と名詞が含まれる料理名のリストです。

入力

thai iced tea
spicy fried chicken
sweet chili pork
thai chicken curry

出力:

thai tea, iced tea
spicy chicken, fried chicken
sweet pork, chili pork
thai chicken, chicken curry, thai curry

基本的に、文ツリーを解析し、形容詞と名詞を組み合わせてバイグラムを生成しようとしています。

そして、これをスペイシーまたはnltkで達成したいと思います

score 1 · Accepted Answer

このようなもの：

>>> from nltk import bigrams
>>> text = """thai iced tea
... spicy fried chicken
... sweet chili pork
... thai chicken curry"""
>>> lines = map(str.split, text.split('\n'))
>>> for line in lines:
...     ", ".join([" ".join(bi) for bi in bigrams(line)])
... 
'thai iced, iced tea'
'spicy fried, fried chicken'
'sweet chili, chili pork'
'thai chicken, chicken curry'

colibricore またはhttps://proycon.github.io/colibri-core/doc/#installationを使用;P

python - spacy/nltk を使用してバイ/トライグラムを生成する方法

3 に答える 3

Related

Reference