pyparsing - A nested parenthesis parsing case leading multiple sequence results

Question

I'd like to parse a string with nested parenthesis with these conditions:

Elements are delimited by comma , or bar |.
Nested parenthesis elements might be a single alphanum or another nested parenthesis.
Each nested parenthesis element connected by bar | literal leads to creation of a new sequence combining previous sequence elements and forward elements connected by comma, outside that nested parenthesis.

In order to clarify, let me give some examples of input strings and the results they should return:

(a, b, c) should return: a, b, c

(a, (b | c)) should return: a, b and a, c

(a, b, (c | (d, e)), f) should return: a, b, c, f and a, b, d, e, f

(a, b, (c | (d, e) | f), g) should return: a, b, c, g and a, b, d, e, g and a, b, f, g

(a, b, c, ((d, (e | f)) | (g, h)), i) should return: a, b, c, d, e, i and a, b, c, d, f, i and a, b, c, g, h, i

((a | b), c) should return: a, c and b, c

score 3 · Accepted Answer

（pyparsing wikiから）infixNotation（以前はとして知られていたoperatorPrecedence）を使用して解析された文字列を取得できます。「,」が「|」よりも優先されると仮定すると、次のようになります。

variable = oneOf(list(alphas.lower()))
expr = infixNotation(variable, 
            [
            (',', 2, opAssoc.LEFT),
            ('|', 2, opAssoc.LEFT),
            ])

テストケースを小さなテストフレームワークに変換すると、少なくとも解析部分をテストできます。

tests = [
    ("(a, b, c)", ["abc"]),
    ("(a, b | c)", ["ab", "c"]),
    ("((a, b) | c)", ["ab", "c"]),
    ("(a, (b | c))", ["ab", "ac"]),
    ("(a, b, (c | (d, e)), f)", ["abcf","abdef"]),
    ("(a, b, (c | (d, e) | f), g)", ["abcg", "abdeg", "abfg"]),
    ("(a, b, c, ((d, (e | f)) | (g, h)), i)",
      ["abcdei", "abcdfi", "abcghi"]),
    ("((a | b), c)", ["ac", "bc"]),
    ]

for test,expected in tests:
    # if your expected values *must* be lists and not strings, then
    # add this line
    # expected = [list(ex) for ex in expected]
    result = expr.parseString(test)
    print result[0].asList()

次のようなものが得られます。

['a', ',', 'b', ',', 'c']
[['a', ',', 'b'], '|', 'c']
[['a', ',', 'b'], '|', 'c']
['a', ',', ['b', '|', 'c']]
['a', ',', 'b', ',', ['c', '|', ['d', ',', 'e']], ',', 'f']
['a', ',', 'b', ',', ['c', '|', ['d', ',', 'e'], '|', 'f'], ',', 'g']
['a', ',', 'b', ',', 'c', ',', [['d', ',', ['e', '|', 'f']], '|', ['g', ',', 'h']], ',', 'i']
[['a', '|', 'b'], ',', 'c']

文字列を解析し、演算子の優先順位を結果の構造に反映させるのは簡単な部分です。正規表現インバーターの例に従う場合、次のように、解析された各ビットにオブジェクトをアタッチする必要があります。

class ParsedItem(object):
    def __init__(self, tokens):
        self.tokens = tokens[0]
class Var(ParsedItem): 
    """ TBD """
class BinaryOpn(ParsedItem):
    def __init__(self, tokens):
        self.tokens = tokens[0][::2]
class Sequence(BinaryOpn):
    """ TBD """
class Alternation(BinaryOpn):
    """ TBD """

variable = oneOf(list(alphas.lower())).setParseAction(Var)
expr = infixNotation(variable, 
            [
            (',', 2, opAssoc.LEFT, Sequence),
            ('|', 2, opAssoc.LEFT, Alternation),
            ])

Varここで、、Sequence、およびの本体を実装する必要がありAlternationます。pyparsing から値のリストを直接取得するのではなく、これらのオブジェクトタイプのいずれかを取得します。asList()次に、上記のサンプルで行ったように呼び出す代わりに、generateorのようなものを呼び出してmakeGenerator、そのオブジェクトからジェネレーターを取得します。次に、そのジェネレーターを呼び出して、オブジェクトにさまざまな結果をすべて生成させます。

残りは演習として残します。

-- ポール

pyparsing - A nested parenthesis parsing case leading multiple sequence results

1 に答える 1

Related

Reference