python - 予期しない正規表現の動作

Question

脚本：

import re

matches = ['hello', 'hey', 'hi', 'hiya']

def check_match(string):
    for item in matches:
        if re.search(item, string):
            print 'Match found: ' + string
        else:
            print 'Match not found: ' + string

check_match('hey')
check_match('hello there')
check_match('this should not match')
check_match('oh, hiya')

出力：

Match not found: hey
Match found: hey
Match not found: hey
Match not found: hey
Match found: hello there
Match not found: hello there
Match not found: hello there
Match not found: hello there
Match not found: this should not match
Match not found: this should not match
Match found: this should not match
Match not found: this should not match
Match not found: oh, hiya
Match not found: oh, hiya
Match found: oh, hiya
Match found: oh, hiya

私が理解していないことがいくつかあります。まず、この出力では各文字列が 4 回検索され、見つかった一致として 2 つを返すものもあれば、3 つを返すものもあります。これを引き起こしているコードの何が問題なのかはわかりませんが、誰かが何が問題なのかを試してみることができますか?

予想される出力は次のようになります。

Match found: hey
Match found: hello there
Match not found: this should not match
Match found: oh, hiya

score 5 · Accepted Answer

それは間違った振る舞いをしているわけではなく、についてのあなたの誤解ですre.search(...)。

出力後のコメントを参照してください。

Match not found: hey                    # because 'hello' is not in 'hey'
Match found: hey                        # because 'hey' is in 'hey'
Match not found: hey                    # because 'hi' is not in 'hey'
Match not found: hey                    # because 'hiya' is not in 'hey'

Match found: hello there                # because 'hello' is in 'hello there'
Match not found: hello there            # because 'hey' is not in 'hello there'
Match not found: hello there            # because 'hi' is not in 'hello there'
Match not found: hello there            # because 'hiya' is not in 'hello there'

Match not found: this should not match  # because 'hello' is not in 'this should not match'
Match not found: this should not match  # because 'hey' is not in 'this should not match'
Match found: this should not match      # because 'hi' is in 'this should not match'
Match not found: this should not match  # because 'hiya' is not in 'this should not match'

Match not found: oh, hiya               # because 'hello' is not in 'oh, hiya'
Match not found: oh, hiya               # because 'hey' is not in 'oh, hiya'
Match found: oh, hiya                   # because 'hi' is in 'oh, hiya'
Match found: oh, hiya                   # because 'hiya' is in 'oh, hiya'

hiinput の場合にパターンに一致させたくない場合は、パターンoh, hiyaの周りに単語境界をラップする必要があります。

\bhi\b

これにより、他の文字で囲まれてhi いないの発生のみに一致します（well hiya thereパターン\bhi\bには一致しませんが、一致しwell hi there ます）。

score 2 · Accepted Answer

これを試してください-より簡潔で、複数の一致にフラグを立てます：

import re

matches = ['hello', 'hey', 'hi', 'hiya']

def check_match(string):
    results = [item for item in matches if re.search(r'\b%s\b' % (item), string)]
    print 'Found %s' % (results) if len(results) > 0 else "No match found"

check_match('hey')
check_match('hello there')
check_match('this should not match')
check_match('oh, hiya')
check_match('xxxxx xxx')
check_match('hello and hey')

与える：

Found ['hey']
Found ['hello']
No match found
Found ['hiya']
No match found
Found ['hello', 'hey']

score 0 · Accepted Answer

配列をループし、配列内の各要素に対して何かを検索して出力しているため、それぞれに対して4つの検索と4つの出力が得られます...

score 0 · Accepted Answer

for ループは、それぞれの「一致」に対して文字列をチェックし、それぞれについて見つかったかどうかを出力します。あなたが本当に望むのは、一致するものがどれか1つでも一致するかどうかを確認し、「見つかった」または「見つからなかった」を1つ出力することです。私は実際にpythonを知らないので、構文がオフになっている可能性があります。

for item in matches:
    if re.search(item, string):
    found = true
if found:
    print 'Match found: ' + string
else:
    print 'Match not found: ' + string

`

python - 予期しない正規表現の動作

4 に答える 4

Related

Reference