python - `match = re.match(); の代替一致する場合: ...` イディオム?

Question

何かが正規表現に一致するかどうかを確認したい場合は、最初のグループを出力します..

import re
match = re.match("(\d+)g", "123g")
if match is not None:
    print match.group(1)

これは完全に衒学的ですが、中間match変数は少し面倒です..

Perl などの言語は、新しい$1..$9マッチグループの変数を作成することでこれを行います。

if($blah ~= /(\d+)g/){
    print $1
}

このredditのコメントから、

with re_context.match('^blah', s) as match:
    if match:
        ...
    else:
        ...

..これは興味深いアイデアだと思ったので、簡単な実装を書きました。

#!/usr/bin/env python2.6
import re

class SRE_Match_Wrapper:
    def __init__(self, match):
        self.match = match

    def __exit__(self, type, value, tb):
        pass

    def __enter__(self):
        return self.match

    def __getattr__(self, name):
        if name == "__exit__":
            return self.__exit__
        elif name == "__enter__":
            return self.__name__
        else:
            return getattr(self.match, name)

def rematch(pattern, inp):
    matcher = re.compile(pattern)
    x = SRE_Match_Wrapper(matcher.match(inp))
    return x
    return match

if __name__ == '__main__':
    # Example:
    with rematch("(\d+)g", "123g") as m:
        if m:
            print(m.group(1))

    with rematch("(\d+)g", "123") as m:
        if m:
            print(m.group(1))

(この機能は、理論的にはオブジェクトにパッチすることができ_sre.SRE_Matchます)

一致するものがない場合、ステートメントのコードブロックの実行をスキップできると便利ですwith。これにより、これが簡素化されます。

with rematch("(\d+)g", "123") as m:
    print(m.group(1)) # only executed if the match occurred

..しかし、これはPEP 343から推測できることに基づいて不可能に思えます

何か案は？私が言ったように、これは本当に些細な煩わしさであり、ほとんどコードゴルフのようです..

score 12 · Accepted Answer

些細なことではないと思います。そのようなコードを頻繁に書いている場合、コードの周りに冗長な条件を振りかける必要はありません。

これは少し奇妙ですが、イテレータを使用してこれを行うことができます。

import re

def rematch(pattern, inp):
    matcher = re.compile(pattern)
    matches = matcher.match(inp)
    if matches:
        yield matches

if __name__ == '__main__':
    for m in rematch("(\d+)g", "123g"):
        print(m.group(1))

奇妙な点は、繰り返し処理を行わないものにイテレータを使用していることです。これは条件に近く、一見すると、一致ごとに複数の結果が得られるように見えるかもしれません。

コンテキストマネージャーが管理対象関数を完全にスキップできないのは奇妙に思えます。これは明示的に「with」の使用例の 1 つではありませんが、自然な拡張のように思えます。

score 3 · Accepted Answer

もう 1 つの優れた構文は、次のようなものです。

header = re.compile('(.*?) = (.*?)$')
footer = re.compile('(.*?): (.*?)$')

if header.match(line) as m:
    key, value = m.group(1,2)
elif footer.match(line) as m
    key, value = m.group(1,2)
else:
    key, value = None, None

score 0 · Accepted Answer

完璧なソリューションではありませんが、同じstrに対して複数の一致オプションを連鎖させることができます。

class MatchWrapper(object):
  def __init__(self):
    self._matcher = None

  def wrap(self, matcher):
    self._matcher = matcher

  def __getattr__(self, attr):
    return getattr(self._matcher, attr)

def match(pattern, s, matcher):
  m = re.match(pattern, s)
  if m:
    matcher.wrap(m)
    return True
  else:
    return False

matcher = MatchWrapper()
s = "123g";
if _match("(\d+)g", line, matcher):
  print matcher.group(1)
elif _match("(\w+)g", line, matcher):
  print matcher.group(1)
else:
  print "no match"

score 0 · Accepted Answer

withこの場合、使用は解決策ではないと思います。そのBLOCK部分（ユーザーが指定したもの）で例外を発生させ、__exit__メソッドを返しTrueて例外を「飲み込む」必要があります。そのため、決して見栄えがよくありません。

Perl の構文に似た構文を使用することをお勧めします。独自の拡張reモジュールを作成し (これをと呼びますrex)、モジュールの名前空間に変数を設定します。

if rex.match('(\d+)g', '123g'):
    print rex._1

以下のコメントでわかるように、このメソッドはスコープセーフでもスレッドセーフでもありません。アプリケーションが将来マルチスレッド化されず、これを使用しているスコープから呼び出される関数も同じメソッドを使用することが完全に確実な場合にのみ、これを使用します。

score 0 · Accepted Answer

これらの多くを 1 か所で行っている場合は、別の答えを次に示します。

import re
class Matcher(object):
    def __init__(self):
        self.matches = None
    def set(self, matches):
        self.matches = matches
    def __getattr__(self, name):
        return getattr(self.matches, name)

class re2(object):
    def __init__(self, expr):
        self.re = re.compile(expr)

    def match(self, matcher, s):
        matches = self.re.match(s)
        matcher.set(matches)
        return matches

pattern = re2("(\d+)g")
m = Matcher()
if pattern.match(m, "123g"):
    print(m.group(1))
if not pattern.match(m, "x123g"):
    print "no match"

re と同じスレッドセーフで正規表現を 1 回コンパイルし、関数全体に対して単一の再利用可能な Matcher オブジェクトを作成すると、それを非常に簡潔に使用できます。これには、明らかな方法で反転できるという利点もあります。反復子を使用してそれを行うには、結果を反転するようにフラグを渡す必要があります。

ただし、関数ごとに 1 つのマッチしか実行していない場合は、あまり役に立ちません。それよりも広いコンテキストで Matcher オブジェクトを保持したくありません。Blixt のソリューションと同じ問題が発生します。

score 0 · Accepted Answer

これはあまり見栄えがよくありませんが、次のgetattr(object, name[, default])ように組み込み関数を使用して利益を得ることができます。

>>> getattr(re.match("(\d+)g", "123g"), 'group', lambda n:'')(1)
'123'
>>> getattr(re.match("(\d+)g", "X23g"), 'group', lambda n:'')(1)
''

if match 出力グループフローを模倣するために、次のforようにステートメントを (ab) 使用できます。

>>> for group in filter(None, [getattr(re.match("(\d+)g", "123g"), 'group', None)]):
        print(group(1))
123
>>> for group in filter(None, [getattr(re.match("(\d+)g", "X23g"), 'group', None)]):
        print(group(1))
>>>

もちろん、ちょっとした関数を定義して汚い仕事をすることもできます:

>>> matchgroup = lambda p,s: filter(None, [getattr(re.match(p, s), 'group', None)])
>>> for group in matchgroup("(\d+)g", "123g"):
        print(group(1))
123
>>> for group in matchgroup("(\d+)g", "X23g"):
        print(group(1))
>>>

python - `match = re.match(); の代替 一致する場合: ...` イディオム?

10 に答える 10

Related

Reference

python - `match = re.match(); の代替一致する場合: ...` イディオム?