python - Pyenchantは外国人のキャラクターを台無しにします

Question

Pyenchantは外国のキャラクターを台無しにし、スペルチェックは失敗します。私のガールフレンドはドイツ語なので、「häßlich」という単語は本物のドイツ語です。また、さまざまなスペルチェックサービスを使用して単語をチェックしました。

スクリプトファイルのエンコーディングは、UTF-8としてのANSIです。私も、単語をさまざまな種類の文字エンコードにエンコードおよびデコードしようとしました。

#!/usr/bin/python
# -*- coding: utf-8 -*-

# Python bindings for the enchant spellcheck
import enchant

# Enchant dictionary
enchantdict = enchant.Dict("de_DE")

# Define german word for "ugly"
word = "häßlich"

# Print the original word and the spellchecked version of it
print word, "=", enchantdict.check(word)

そして、出力は次のとおりです。h├ñ├ƒlich= False

また、スクリプトエンコーディングをプレーンANSIに変更すると、次のようになります。

hõ¯lich =
** (python.exe:1096): CRITICAL **: enchant_dict_check: assertion `g_utf8_validate(word, len, NULL)' failed

Traceback (most recent call last):
  File "C:\Temp\koe.py", line 14, in <module>
    print word, "=", enchantdict.check(word)
  File "C:\Python27\lib\site-packages\enchant\__init__.py", line 577, in check
    self._raise_error()
  File "C:\Python27\lib\site-packages\enchant\__init__.py", line 551, in _raise_
error
    raise eclass(default)
enchant.errors.Error: Unspecified Error

私が使用しているもの：pyenchant-1.6.5.win32.exe python-2.7.3.msi Windows 7

...そして、より良いスペルチェッカーを念頭に置いている場合は、それについて教えてください、私はそれをテストします:)

score 2 · Accepted Answer

Python にはバイト文字列と Unicode 文字列の 2 種類の文字列があるという事実につまずいています。文字列を Unicode 文字列にするには、文字列の前に「u」が必要です。

word = u"häßlich"

また、häßlich は hässlich の古いスペルです(後者は辞書にあり、提案として返されます)。正しいスペルであると見なしたい場合は、正しいスペルの単語の個人的なリストに häßlich を追加できます。

enchantdict.add(単語)

python - Pyenchantは外国人のキャラクターを台無しにします

1 に答える 1

Related

Reference