python - Python のスペルチェッカー

Question

私はPythonとNLTKにかなり慣れていません。スペルチェック (スペルが間違っている単語を正しい単語に置き換える) を実行できるアプリケーションで忙しくしています。現在、Python 2.7 の Enchant ライブラリ、PyEnchant、および NLTK ライブラリを使用しています。以下のコードは、修正/置換を処理するクラスです。

from nltk.metrics import edit_distance

class SpellingReplacer:
    def __init__(self, dict_name='en_GB', max_dist=2):
        self.spell_dict = enchant.Dict(dict_name)
        self.max_dist = 2

    def replace(self, word):
        if self.spell_dict.check(word):
            return word
        suggestions = self.spell_dict.suggest(word)

        if suggestions and edit_distance(word, suggestions[0]) <= self.max_dist:
            return suggestions[0]
        else:
            return word

単語のリストを取り、各単語に対して replace() を実行し、それらの単語のリストを返す関数を作成しましたが、スペルは正しくありません。

def spell_check(word_list):
    checked_list = []
    for item in word_list:
        replacer = SpellingReplacer()
        r = replacer.replace(item)
        checked_list.append(r)
    return checked_list

>>> word_list = ['car', 'colour']
>>> spell_check(words)
['car', 'color']

これはあまり正確ではなく、単語のスペルチェックと置換を実行する方法を探しているので、今はあまり好きではありません。「caaaar」のようなスペルミスを拾えるものも必要ですか? そこにスペルチェックを実行するためのより良い方法はありますか? もしそうなら、それらは何ですか? Googleはどのようにそれを行いますか? 彼らのスペルサジェストはとても良いからです。

助言がありますか？

score 35 · Accepted Answer

Peter Norvig によるこの投稿を注意深く読むことから始めることをお勧めします。（私は似たようなことをしなければならず、非常に便利であることがわかりました。）

特に次の関数には、スペルチェッカーをより洗練されたものにするために必要なアイデアが含まれています: 不規則な単語を分割、削除、転置、挿入して「修正」します。

def edits1(word):
   splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]
   deletes    = [a + b[1:] for a, b in splits if b]
   transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
   replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]
   inserts    = [a + c + b     for a, b in splits for c in alphabet]
   return set(deletes + transposes + replaces + inserts)

注: 上記は、Norvig のスペル修正プログラムからの 1 つのスニペットです。

幸いなことに、スペルチェッカーを段階的に追加して改善し続けることができます。

それが役立つことを願っています。

score 21 · Accepted Answer

Python でスペルチェックを行う最良の方法は、SymSpell、Bk-Tree、または Peter Novig の方法です。

最速のものは SymSpell です。

これはMethod1です: 参照リンクpyspellchecker

このライブラリは、Peter Norvig の実装に基づいています。

pip install pyspellchecker

from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))

Method2: SymSpell Python

pip install -U symspellpy

score 2 · Accepted Answer

jamspellを試してみてください- 自動スペル修正には非常にうまく機能します:

import jamspell

corrector = jamspell.TSpellCorrector()
corrector.LoadLangModel('en.bin')

corrector.FixFragment('Some sentnec with error')
# u'Some sentence with error'

corrector.GetCandidates(['Some', 'sentnec', 'with', 'error'], 1)
# ('sentence', 'senate', 'scented', 'sentinel')

score 1 · Accepted Answer

スペルコレクター->

他の場所に保存する場合は、コーパスをデスクトップにインポートする必要があります。コードのパスを変更します。tkinter を使用していくつかのグラフィックスも追加しました。これは、単語以外のエラーに対処するためだけです!!

def min_edit_dist(word1,word2):
    len_1=len(word1)
    len_2=len(word2)
    x = [[0]*(len_2+1) for _ in range(len_1+1)]#the matrix whose last element ->edit distance
    for i in range(0,len_1+1):  
        #initialization of base case values
        x[i][0]=i
        for j in range(0,len_2+1):
            x[0][j]=j
    for i in range (1,len_1+1):
        for j in range(1,len_2+1):
            if word1[i-1]==word2[j-1]:
                x[i][j] = x[i-1][j-1]
            else :
                x[i][j]= min(x[i][j-1],x[i-1][j],x[i-1][j-1])+1
    return x[i][j]
from Tkinter import *


def retrieve_text():
    global word1
    word1=(app_entry.get())
    path="C:\Documents and Settings\Owner\Desktop\Dictionary.txt"
    ffile=open(path,'r')
    lines=ffile.readlines()
    distance_list=[]
    print "Suggestions coming right up count till 10"
    for i in range(0,58109):
        dist=min_edit_dist(word1,lines[i])
        distance_list.append(dist)
    for j in range(0,58109):
        if distance_list[j]<=2:
            print lines[j]
            print" "   
    ffile.close()
if __name__ == "__main__":
    app_win = Tk()
    app_win.title("spell")
    app_label = Label(app_win, text="Enter the incorrect word")
    app_label.pack()
    app_entry = Entry(app_win)
    app_entry.pack()
    app_button = Button(app_win, text="Get Suggestions", command=retrieve_text)
    app_button.pack()
    # Initialize GUI loop
    app_win.mainloop()

score 1 · Accepted Answer

pyspellcheckerこの問題に対する最良の解決策の 1 つです。ライブラリは、Peter Norvig のブログ投稿pyspellcheckerに基づいています。レーベンシュタイン距離アルゴリズムを使用して、元の単語から編集距離 2 以内の順列を見つけます。このライブラリをインストールするには 2 つの方法があります。公式ドキュメントでは、pipevパッケージの使用を強く推奨しています。

を使用してインストールpip

pip install pyspellchecker

ソースからインストール

git clone https://github.com/barrust/pyspellchecker.git
cd pyspellchecker
python setup.py install

次のコードは、ドキュメントから提供された例です

from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))

score 0 · Accepted Answer

あなたも試すことができます：

pip インストールテキストブロブ

from textblob import TextBlob
txt="machne learnig"
b = TextBlob(txt)
print("after spell correction: "+str(b.correct()))

スペル修正後:機械学習

python - Python のスペル チェッカー

11 に答える 11

スペルコレクター->

Related

Reference

python - Python のスペルチェッカー