python - Pythonの単純なルーターURLマッチャー、最初の「/」出現と再照合する方法

Question

サブジェクトを正規表現と比較し、出現を一致するキーマスクとリンクするルーターモジュールがあります。(symfony http://symfony.com/doc/current/book/routing.htmlのような単純な URL ルーティングフィルタリング)

import re
from functools import partial

def to_named_groups(match, regexes):
    group_name = re.escape(match.group(0)[1:-1])
    group_regex = regexes.get(group_name, '.*')
    return '(?P<{}>{})'.format(group_name, group_regex)

def make_regex(key_mask, regexes):
    regex = re.sub(r'\{[^}]+\}', partial(to_named_groups, regexes=regexes),
                   key_mask)
    return re.compile(regex)

def find_matches(key_mask, text, regexes=None):
    if regexes is None:
        regexes = {}
    try:
        return make_regex(key_mask, regexes).search(text).groupdict()
    except AttributeError:
        return None

.

find_matches('foo/{one}/bar/{two}/hello/{world}', 'foo/test/bar/something/hello/xxx')

出力：

{'one': 'test', 'two': 'something', 'world': 'xxx'} ブロック引用

find_matches('hello/{city}/{phone}/world', 'hello/mycity/12345678/world', regexes={'phone': '\d+'})

出力：

{'city': 'mycity', 'phone': '12345678'} ブロック引用

find_matches('hello/{city}/{phone}/world', 'hello/something/mycity/12345678/world', regexes={'phone': '\d+'})

出力：

{'city': 'something/mycity', 'phone': '12345678'}

これは不一致です ('city': 'something/mycity' ではなく、None を返す必要があります)。どうすればこれを解決できますか? 最初の「/」オカレンスまたは別の方法でどのように一致させることができますか?

ありがとう！

score 1 · Accepted Answer

作成している正規表現を見てみましょう。

hello/(?P<city>.*)/(?P<phone>\d+)/world

これ.*は、パターンの残りの部分と一致するのに十分な数のスラッシュが残っている限り、スラッシュを含むものを含め、何にでも一致します。

スラッシュと一致させたくない場合は、その方法を既に知っています。でまったく同じことを行っているためですre.sub。

def to_named_groups(match, regexes):
    group_name = re.escape(match.group(0)[1:-1])
    group_regex = regexes.get(group_name, '[^/]*')
    return '(?P<{}>{})'.format(group_name, group_regex)

しかし一方で、構築している正規表現を理解していない場合、なぜそれらを構築しているのですか? を使用すると、これをパスで区切られたコンポーネントに簡単に解析できます.split('/')。たとえば、余分なregexesものがなければ、これがあなたの目的だと思います：

def find_matches(key_mask, text):
    mapping = {}
    for key, value in zip(key_mask.split('/'), text.split('/')):
        if key[0] == '{' and key[-1] == '}':
            mapping[key[1:-1]] = value
        elif key != value:
            return
    return mapping

そしてregexes、いくつかの検証チェックを追加する方法にすぎません。(書かれているように、通常のスラッシュ区切りスキームを破るために使用できますが、それは機能ではなくバグだと思います。実際、そもそも StackOverflow に駆り立てられたのはまさにバグだと思います。)それらを明示的に行うだけです：

def find_matches(key_mask, text, regexes={}):
    mapping = {}
    for key, value in zip(key_mask.split('/'), text.split('/')):
        if key[0] == '{' and key[-1] == '}':
            key=key[1:-1]
            if key in regexes and not re.match(regexes[key], value):
                return
            mapping[key] = value
        elif key != value:
            return
    return mapping

/2番目のバージョンでは、スラッシュを適用する前にスラッシュを分割しているため、正規表現が一致することは既に防止されています。したがって、コメントで要求したサニタイズは必要ありません。

いずれにせよ、正規表現をサニタイズする最も簡単な方法は、正規表現を使用してすべてを 1 つの大きな正規表現に構築してからサニタイズしようとするのではなく、使用する前にサニタイズすることです。例えば：

regexes = {key: regex.replace('.*', '[^/]*') for key, regex in regexes.items()}

score 0 · Accepted Answer

(非スラッシュ文字を許可するgroup_regex) など、もう少し制限的なものに変更するか、次のように貪欲さを緩和することを検討してください[^/]*.*?

ソース: http://docs.python.org/2/library/re.html

python - Pythonの単純なルーターURLマッチャー、最初の「/」出現と再照合する方法

2 に答える 2

Related

Reference