python - 正規表現が頻繁に繰り返される

Question

"'オプションで引用符で囲まれた値（有効な引用符はと `）に一致する正規表現を作成しようとしています。ルールは、2つの引用符の出現はエスケープされた引用符であるということです。

これが私が思いついた正規表現です：

(?P<quote>["'`])?(?P<value>(?(quote)((?!(?P=quote).)|((?=(?P=quote)).){2})*|[^\s;]*))(?(quote)(?P=quote)|)

そして今、読めるようになっています（私がそれが何をしていると思うかを示すコメント付き）：

(?P<quote>["'`])?                   #named group Quote (any quoting character?)

    (?P<value>                      #name this group "value", what I am interested in
        (?(quote)               #if quoted 
            ((?!(?P=quote).)|((?=(?P=quote)).){2})* #see below
                                    #match either anything that is not the quote
                                    #or match 2 quotes
        |
            [^\s;]*         #match anything that is not whitespace or ; (my seperators if there are no quotes)
        )
    )

(?(quote)(?P=quote)|)               #if we had a leeding quote we need to consume a closing quote

引用符で囲まれていない文字列に対しては正常に実行され、引用符で囲まれた文字列は次のようにクラッシュします。

    match = re.match(regexValue, line)
  File "****/jython2.5.1/Lib/re.py", line 137, in match
    return _compile(pattern, flags).match(string)
RuntimeError: maximum recursion depth exceeded

私は何を間違えますか？

編集：入力例=>出力（グループ'値'をキャプチャするため（望ましい）

text    => text
'text'  => text
te xt   => te
'te''xt'=> te''xt   #quote=' => strreplace("''","'") => desired result: te'xt
'te xt' => te xt

edit2：それを見ていると、間違いに気づきました。以下を参照してください。ただし、上記はまだ有効であると思います+> Jythonのバグである可能性がありますが、それでも私が望んでいることは実行されません:(非常に微妙です違い、ポイントは先読みグループから移動しました

new:(?P<quote>["'`])?(?P<value>(?(quote)((?!(?P=quote)).|((?=(?P=quote)).){2})*|[^\s;]*))(?(quote)(?P=quote)|)
old:(?P<quote>["'`])?(?P<value>(?(quote)((?!(?P=quote).)|((?=(?P=quote)).){2})*|[^\s;]*))(?(quote)(?P=quote)|)

score 3 · Accepted Answer

コメントで示唆されているように、私は明確にし、すべての可能性を書き留めておくことをお勧めします。

r = r"""
    ([^"'`]+)
    |
    " ((?:""|[^"])*) "
    |
    ' ((?:''|[^'])*) '
    |
    ` ((?:``|[^`])*) `
"""

一致を抽出するときは、4つのグループのうち1つだけが入力されるという事実を使用して、空のグループをすべて削除するだけです。

r = re.compile(r, re.X)
for m in r.findall(''' "fo""o" and 'bar''baz' and `quu````x` '''):
    print ''.join(m)

score 0 · Accepted Answer

私があなたの質問を正しく理解していれば、3つの異なるタイプの引用符を含む文字列が与えられます

「こんにちは、サー」、「猿」は言った、nonchalantly。

引用符で囲まれた値を抽出します。この場合、次のようになります。

こんにちは、サー

猿

さりげなく

次の式はこれらを抽出します。

>>> expr = "\"(.*?)\"|'(.*?)'|`(.*?)`"

観察：

>>> s = """
"Hello, sir", said the 'monkey', `nonchalantly`. 
"""
>>> import re
>>> m = re.finditer(expr, s)
>>> for match in m:
...     print match.group()
...
('Hello, sir', None, None)
(None, 'monkey', None)
(None, None, 'nonchalantly')

ちなみに、あなた自身の正規表現は私のバージョンのpython（Mac OSX10.7.4ではcPython2.7.2）で動作するように見えますが、間違った結果を生成します。

score 0 · Accepted Answer

私は少しいじった後に解決策を見つけました：

good:(?P<quote>["'`])?(?P<value>(?(quote)((?!(?P=quote)).|((?=(?P=quote)).){2})*|[^;\s]*))(?(quote)(?P=quote)|)
bad :(?P<quote>["'`])?(?P<value>(?(quote)((?!(?P=quote)).|((?=(?P=quote)).){2})*|[^\s;]*))(?(quote)(?P=quote)|)

いいえ、違いがわかりません

python - 正規表現が頻繁に繰り返される

3 に答える 3

Related

Reference