python - Python ステマーの問題: 間違ったステム

Question

こんにちは、Python ステマーで単語をステミングしようとしています。Porter と Lancaster を試しましたが、同じ問題があります。彼らは、「er」または「e」で終わる正確な単語をステミングできません。

たとえば、それらはステム

computer -->  comput

rotate   -->  rotat

これはコードの一部です

line=line.lower()
line=re.sub(r'[^a-z0-9 ]',' ',line)
line=line.split()
line=[x for x in line if x not in stops]
line=[ porter.stem(word, 0, len(word)-1) for word in line]
# or 'line=[ st.stem(word) for word in line]'
return line

この問題を解決するアイデアはありますか?

score 3 · Accepted Answer

ウィキペディアのページを引用するIn computational linguistics, a stem is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the word. For example, given the word "produced", its lemma (linguistics) is "produce", however the stem is "produc": this is because there are words such as production. と、コードは正しい結果をもたらす可能性があります。ステマーが生成するものではないレンマを期待しているようです (レンマがたまたまステムと等しい場合を除く)

python - Python ステマーの問題: 間違ったステム

1 に答える 1

Related

Reference