python - Python テキスト検索ライブラリ

Question

次のようなことができるライブラリを探しています。

matches(
    user_input="hello world how are you what are you doing",
    keywords='+world -tigers "how are" -"bye bye"'
)

基本的に、単語の存在、単語の不在、および単語のシーケンスに基づいて文字列を一致させたいと考えています。Solr のような検索エンジンは必要ありません。文字列は前もって知られておらず、一度しか検索されないからです。そのようなライブラリは既に存在しますか? もしそうなら、どこで見つけることができますか? それとも、正規表現ジェネレーターを作成する運命にありますか?

score 2 · Accepted Answer

regexモジュールは名前付きリストをサポートしています:

import regex

def match_words(words, string):
    return regex.search(r"\b\L<words>\b", string, words=words)

def match(string, include_words, exclude_words):
    return (match_words(include_words, string) and
            not match_words(exclude_words, string))

例：

if match("hello world how are you what are you doing",
         include_words=["world", "how are"],
         exclude_words=["tigers", "bye bye"]):
    print('matches')

reたとえば、標準モジュールを使用して名前付きリストを実装できます。

import re

def match_words(words, string):
    re_words = '|'.join(map(re.escape, sorted(words, key=len, reverse=True)))
    return re.search(r"\b(?:{words})\b".format(words=re_words), string)

+、-、および "" 文法に基づいて、含まれる単語と除外される単語のリストを作成するにはどうすればよいですか?

使用できますshlex.split()：

import shlex

include_words, exclude_words = [], []
for word in shlex.split('+world -tigers "how are" -"bye bye"'):
    (exclude_words if word.startswith('-') else include_words).append(word.lstrip('-+'))

print(include_words, exclude_words)
# -> (['world', 'how are'], ['tigers', 'bye bye'])

python - Python テキスト検索ライブラリ

2 に答える 2

Related

Reference