python - Pythonのstring.find（）は特殊文字を処理できません

Question

エラーは読み取り機能にあると思います。画像内の特殊文字を超えて読み取ることはできませんrepr出力を参照してください

私は次のようにPythonでstring.find（）を使用しています：

indexOfClosedDoc = temp.find("</DOC>",indexOfOpenDoc)

ただし、文字列に次のようなテキストがある場合：

SUB
</DOC>

SUBが特殊文字の場合、temp.findはタグを見つけることができません。これを修正する方法に関する提案

例：

ここに画像の説明を入力してください

失敗する原因となるコード：

handle = open("error.txt",'r');
temp = handle.read();
index = temp.find("</DOC>",0)
if(index == -1):
    print "Error"
    exit(1)

画像テキストをテキストファイルに入れて、コードを実行します

これは、例のテキストの一時変数のreprです。eror.txtのテキストは、画像の29722行目からすべてです。

' </P>\n\n'

注：read（）関数はSUBを超えて読み取ることはないため、検索は問題外です。

score 2 · Accepted Answer

答えは、「rb」モードを使用してファイルを開くことです。Windowsでは、「r」だけでファイルを開くと、0x1A（DOS EOF文字）で停止するという古いDOSの動作が使用されます。0x1Aのライン読み取りチョークも参照してください。

score 0 · Accepted Answer

注：ファイルがマルチバイトエンコーディングを使用している場合、ファイルにマルチバイトエンコーディング.find()がなくても機能0x1Aしません。例：

import codecs

with codecs.open('file.utf16', 'w', encoding='utf-16') as file:
    file.write(u"abcd") # write a string using utf-16 encoding

#XXX incorrect code don't use it
with open('file.utf16', 'r') as f:
    temp = f.read()
    i = temp.find('bc')
    print i #XXX -> -1 not found

with open('file.utf16', 'rb') as f:
    temp = f.read()
    i = temp.find('bc')
    print i #XXX -> -1 not found

# works
with codecs.open('file.utf16', encoding='utf-16') as f:
    temp = f.read()
    i = temp.find('bc')
    print i # -> 1 found

score -1 · Accepted Answer

あなたの価値をチェックしてくださいindexOfOpenDoc 、私はそれが場所が表示されるよりも大きいとは思えません。

python - Pythonのstring.find（）は特殊文字を処理できません

3 に答える 3

Related

Reference