python - Python 文字列 'in' 演算子の実装アルゴリズムと時間の複雑さ

Question

inたとえば、オペレーターがどのように実装するかを考えています

>>> s1 = 'abcdef'
>>> s2 = 'bcd'
>>> s2 in s1
True

CPython では、文字列の一致を実装するためにどのアルゴリズムが使用され、時間の計算量はどのくらいですか? これに関する公式文書やウィキはありますか？

score 54 · Accepted Answer

It's a combination of Boyer-Moore and Horspool.

You can view the C code here:

Fast search/count implementation, based on a mix between Boyer-Moore and Horspool, with a few more bells and whistles on the top. For some more background, see: https://web.archive.org/web/20201107074620/http://effbot.org/zone/stringlib.htm.

From the link above:

When designing the new algorithm, I used the following constraints:

should be faster than the current brute-force algorithm for all test cases (based on real-life code), including Jim Hugunin’s worst-case test
small setup overhead; no dynamic allocation in the fast path (O(m) for speed, O(1) for storage)
sublinear search behaviour in good cases (O(n/m))
no worse than the current algorithm in worst case (O(nm))
should work well for both 8-bit strings and 16-bit or 32-bit Unicode strings (no O(σ) dependencies)
many real-life searches should be good, very few should be worst case
reasonably simple implementation

python - Python 文字列 'in' 演算子の実装アルゴリズムと時間の複雑さ

1 に答える 1

Related

Reference