python - Python での貪欲でない dotall 正規表現

Question

PHP で記述されたメソッドのアノテーションを解析する必要があります。それらを検索するための正規表現 (以下の単純化された例を参照) を作成しましたが、期待どおりに機能しません。/**との間のテキストの最も短い部分に一致する代わりに*/、ソースコードの最大量に一致します (以前のメソッドは注釈付き)。私は正しい.*?貪欲でないバージョンを使用していると確信し*ており、DOTALLがそれをオフにする証拠は見つかりませんでした。どこに問題があるのでしょうか？ありがとうございました。

p = re.compile(r'(?:/\*\*.*?\*/)\n\s*public', re.DOTALL)
methods = p.findall(text)

score 1 · Accepted Answer

あなたはこれを手に入れようとしていると思います。

>>> text = """ /** * comment */ class MyClass extens Base { /** * comment */ public function xyz """
>>> m = re.findall(r'\/\*\*(?:(?!\*\/).)*\*\/\s*public', text, re.DOTALL)
>>> m
['/** * comment */ public']

最終的な一致を望んでいない場合はpublic、正の先読みを使用する以下の正規表現を使用します。

>>> m = re.findall(r'\/\*\*(?:(?!\*\/).)*\*\/(?=\s*public)', text, re.DOTALL)
>>> m
['/** * comment */']

score 0 · Accepted Answer

これを使用できるはずです：

\/\*\*([^*]|\*[^/])*?\*\/\s*public

これは、アスタリスク (*) 以外の記号と一致します。アスタリスクの場合、その後にスラッシュを続けることはできません。つまり、公開直前に閉じられたコメントのみをキャプチャする必要があります。

例: http://regexr.com/398b3

説明: http://tinyurl.com/lcewdmo

免責事項:コメントに含まれ*/ている場合、これは機能しません。

score 0 · Accepted Answer

# Some examples and assuming that the annotation you want to parse
# starts with a /** and ends with a */.  This may be spread over
# several lines.

text = """
/**
 @Title(value='Welcome', lang='en')
 @Title(value='Wilkommen', lang='de')
 @Title(value='Vitajte', lang='sk')
 @Snippet
    ,*/
class WelcomeScreen {}

   /** @Target("method") */
  class Route extends Annotation {}

/** @Mapping(inheritance = @SingleTableInheritance,
    columns = {@ColumnMapping('id'), @ColumnMapping('name')}) */
public Person {}

"""

text2 = """ /** * comment */
CLASS MyClass extens Base {

/** * comment */
public function xyz
"""


import re

# Match a PHP annotation and the word following class or public
# function.
annotations = re.findall(r"""/\*\*             # Starting annotation
                                               # 
                            (?P<annote>.*?)    # Namned, non-greedy match
                                               # including newline
                                               #
                             \*/               # Ending annotation
                                               #
                             (?:.*?)           # Non-capturing non-greedy
                                               # including newline
                 (?:public[ ]+function|class)  # Match either
                                               # of these
                             [ ]+              # One or more spaces
                             (?P<name>\w+)     # Match a word
                         """,
                         text + text2,
                         re.VERBOSE | re.DOTALL | re.IGNORECASE)

for txt in annotations:
     print("Annotation: "," ".join(txt[0].split()))
     print("Name: ", txt[1])

python - Python での貪欲でない dotall 正規表現

4 に答える 4

Related

Reference