python - 大文字と小文字を区別しない文字列比較を行うにはどうすればよいですか?

Question

Pythonで大文字と小文字を区別しない文字列比較を行うにはどうすればよいですか?

非常にシンプルで Pythonic な方法を使用して、通常の文字列とリポジトリ文字列の比較をカプセル化したいと思います。また、通常の python 文字列を使用して、文字列によってハッシュされた dict の値を検索できるようにしたいと考えています。

score 703 · Accepted Answer

ASCII 文字列の場合:

string1 = 'Hello'
string2 = 'hello'

if string1.lower() == string2.lower():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are NOT the same (case insensitive)")

Python 3.3 の時点では、casefold()がより良い代替手段です:

string1 = 'Hello'
string2 = 'hello'

if string1.casefold() == string2.casefold():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are NOT the same (case insensitive)")

より複雑なユニコード比較を処理するより包括的なソリューションが必要な場合は、他の回答を参照してください。

score 64 · Accepted Answer

Python 2 を使用.lower()して、各文字列または Unicode オブジェクトを呼び出す...

string1.lower() == string2.lower()

...ほとんどの場合は機能しますが、 @tchrist が説明した状況では実際には機能しません。

とという 2 つの文字列をunicode.txt含むというファイルがあるとします。Python 2 の場合:ΣίσυφοςΣΊΣΥΦΟΣ

>>> utf8_bytes = open("unicode.txt", 'r').read()
>>> print repr(utf8_bytes)
'\xce\xa3\xce\xaf\xcf\x83\xcf\x85\xcf\x86\xce\xbf\xcf\x82\n\xce\xa3\xce\x8a\xce\xa3\xce\xa5\xce\xa6\xce\x9f\xce\xa3\n'
>>> u = utf8_bytes.decode('utf8')
>>> print u
Σίσυφος
ΣΊΣΥΦΟΣ

>>> first, second = u.splitlines()
>>> print first.lower()
σίσυφος
>>> print second.lower()
σίσυφοσ
>>> first.lower() == second.lower()
False
>>> first.upper() == second.upper()
True

Σ 文字には、σ と σ の 2 つの小文字形式があり、大文字と小文字を.lower()区別せずに比較するのには役立ちません。

ただし、Python 3 では、3 つの形式はすべて ς に解決され、両方の文字列で lower() を呼び出すと正しく機能します。

>>> s = open('unicode.txt', encoding='utf8').read()
>>> print(s)
Σίσυφος
ΣΊΣΥΦΟΣ

>>> first, second = s.splitlines()
>>> print(first.lower())
σίσυφος
>>> print(second.lower())
σίσυφος
>>> first.lower() == second.lower()
True
>>> first.upper() == second.upper()
True

したがって、ギリシャ語の 3 つのシグマのようなエッジケースが必要な場合は、Python 3 を使用してください。

(参考までに、Python 2.7.3 と Python 3.3.0b1 は上記のインタープリターの出力に示されています。)

score 9 · Accepted Answer

ここで regexを使用してこのソリューションを見ました。

import re
if re.search('mandy', 'Mandy Pande', re.IGNORECASE):
# is True

アクセントによく効きます

In [42]: if re.search("ê","ê", re.IGNORECASE):
....:        print(1)
....:
1

ただし、大文字と小文字を区別しない Unicode 文字では機能しません。私の理解では、ケースが真であるためには正確なシンボルが必要であると指摘してくれてありがとう@Rhymoid。出力は次のとおりです。

In [36]: "ß".lower()
Out[36]: 'ß'
In [37]: "ß".upper()
Out[37]: 'SS'
In [38]: "ß".upper().lower()
Out[38]: 'ss'
In [39]: if re.search("ß","ßß", re.IGNORECASE):
....:        print(1)
....:
1
In [40]: if re.search("SS","ßß", re.IGNORECASE):
....:        print(1)
....:
In [41]: if re.search("ß","SS", re.IGNORECASE):
....:        print(1)
....:

score 4 · Accepted Answer

通常のアプローチは、検索と比較のために文字列を大文字または小文字にすることです。例えば：

>>> "hello".upper() == "HELLO".upper()
True
>>>

score 1 · Accepted Answer

1

最初に小文字に変換するのはどうですか？使用できますstring.lower()。

于 2008-11-26T01:09:47.773 に答える

score 0 · Accepted Answer

str.contains() でcase=Falseに言及できます

data['Column_name'].str.contains('abcd', case=False)

score -2 · Accepted Answer

def insenStringCompare(s1, s2):
    """ Method that takes two strings and returns True or False, based
        on if they are equal, regardless of case."""
    try:
        return s1.lower() == s2.lower()
    except AttributeError:
        print "Please only pass strings into this method."
        print "You passed a %s and %s" % (s1.__class__, s2.__class__)

python - 大文字と小文字を区別しない文字列比較を行うにはどうすればよいですか?

11 に答える 11

Related

Reference