python - キーが存在しない場合、正規表現はファイルパスの一部と一致します

Question

Pythonで正規表現を使用して特定のキーワードが含まれていない場合、ファイルパスの一部を一致させようとしています。たとえば、「/exclude/this/test/other」に正規表現を適用すると一致しないはずですが、「/this/test/other」は「other」を除くファイルパス、つまり「/this/test」を返す必要があります。「その他」は任意のディレクトリです。これまでのところ、私はこれを使用しています

In [153]: re.findall("^(((?!exclude).)*(?=test).*)?", "/exclude/this/test/other")
Out[153]: [('', '')]

re.findall("^(((?!exclude).)*(?=test).*)?", "/this/test/other")
Out[152]: [('/this/test/other', '/')]

しかし、「テスト」後に一致を停止することはできません。また、空の一致がいくつかあります。何か案は？

score 2 · Accepted Answer

findall()（1）の代わりに使用search()し、（2）非キャプチャの代わりにキャプチャグループを使用しているため、追加の結果が得られます。

>>> import re
>>> re.search(r'^(?:(?:(?!exclude).)*(?=test)*)$', "/this/test").group(0)
'/this/test'

これも機能しfindall()ますが、文字列全体を一致させる場合は、実際には意味がありません。さらに重要なのは、正規表現のインクルード部分が機能しないことです。これをチェックして：

>>> re.search(r'^(?:(?:(?!exclude).)*(?=test)*)$', "/this/foo").group(0)
'/this/foo'

これは、*in(?=test)*によって先読みがオプションになり、意味がなくなるためです。しかし、を取り除くこと*は実際には解決策ではありません。なぜならexclude、testはやのような長い単語の一部である可能性があるからexcludexxですyyytest。これがより良い正規表現です：

r'^(?=.*/test\b)(?!.*/exclude\b)(?:/\w+)+$'

テスト済み：

>>> re.search(r'^(?=.*/test\b)(?!.*/exclude\b)(?:/\w+)+$', '/this/test').group()
'/this/test'
>>> re.search(r'^(?=.*/test\b)(?!.*/exclude\b)(?:/\w+)+$', '/this/foo').group()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

編集：「オプションの先読み」の問題を修正したようですが、正規表現全体がオプションになりました。

編集：後で一致を停止したい場合は/test、これを試してください：

r'^(?:/(?!test\b|exclude\b)\w+)*/test\b'

(?:/(?!test\b|exclude\b)\w+)*/testまたはでない限り、0個以上のパスコンポーネントに一致します/exclude。

score 2 · Accepted Answer

inキーワードが存在するかどうかのみを確認する必要がある場合にのみ使用します。

In [33]: s1="/exclude/this/test"

In [34]: s2="this/test"

In [35]: 'exclude' in s1
Out[35]: True

In [36]: 'exclude' in s2
Out[36]: False

編集: または、テストのみまでのパスが必要な場合:

if 'exclude' not in s:
    re.findall(r'(.+test)',s)

score 1 · Accepted Answer

一致が単純なキーワードで実行できるよりも複雑inな場合は、2 つの正規表現を実行するとより明確になる可能性があります。

import re
s1="/exclude/this/test"
s2="this/test"

for s in (s1,s2):
    if re.search(r'exclude',s): 
        print 'excluding:',s
        continue
    print s, re.findall(r'test',s)

版画:

excluding: /exclude/this/test
this/test ['test']

それが目標であれば、2 つの正規表現をコンパクトにすることができます。

print [(s,re.findall(r'test',s)) for s in s1,s2 if not re.search(r'exclude',s)]

編集

あなたの編集を理解していれば、これは機能します：

s1="/exclude/this/test/other"
s2="/this/test/other"

print [(s,re.search(r'(.*?)/[^/]+$',s).group(1)) for s in s1,s2 if not re.search(r'exclude',s)]

版画:

[('/this/test/other', '/this/test')]

python - キーが存在しない場合、正規表現はファイルパスの一部と一致します

3 に答える 3

Related

Reference