python - 式を解析してリストにする

Question

(((a+b)+c)+(d+e)) のように、括弧と + を使用した式が与えられます。

この解析ツリーを見つけて、この解析ツリーのリスト形式を [ [ [a, b], c ], [d, e] ] のように出力する必要があります。

私は ast のようなものを使用し、次に ast2list を使用することを考えていました。ただし、これらを完全に理解していないため、構文エラーが繰り返し発生します。これは私が持っているものです:

import ast
import parser

a = ast.parse("(((a+b)+c)+(d+e))", mode='eval')
b = parser.ast2list(a)


print(b)

誰かが私を正しい方向に導くことができますか? ありがとう。

score 2 · Accepted Answer

本当にパーサーを実行したい場合は、コードを記述せずに、文法がどのように機能するかを理解することから始めます。Backus-Naur FormatまたはBNFは、文法を定義するために使用される一般的な表記法です。中置記法は一般的なソフトウェアエンジニアリングの解析トピックであり、中置記法の基本的なBNF構造は次のようになります。

letter ::= 'a'..'z'
operand ::= letter+
term ::= operand | '(' expr ')'
expr ::= term ( '+' term )*

重要なのはterm、アルファベットのオペランドまたは（）でラップされた部分式全体のいずれかを含むことです。その部分式は式全体とまったく同じであるため、この再帰的定義はすべての括弧の入れ子を処理します。この場合、式は、2項の「+」演算子を使用して追加された、0個以上の用語が続く用語です。（term減算や乗算/除算も処理できるように拡張できますが、この答えを必要以上に複雑にするつもりはありません。）

Pyparsingは、Pythonオブジェクトを使用してBNFを動作中のパーサーに簡単に変換できるパッケージです（Ply、spark、およびyappsは、パーサー作成の従来のlex / yaccモデルに従う他のパーサーです）。これは、pyparsingを使用して直接実装されたBNFです。

from pyparsing import Suppress, Word, alphas, Forward, Group, ZeroOrMore

LPAR, RPAR, PLUS = map(Suppress, "()+")
operand = Word(alphas)

# forward declare our overall expression, necessary when defining a recursive grammar
expr = Forward()

# each term is either an alpha operand, or an expr in ()'s
term = operand | Group(LPAR + expr + RPAR)

# define expr as a term, with optional '+ term's
expr << term + ZeroOrMore(PLUS + term)

# try it out
s = "(((a+b)+c)+(d+e))"
print expr.parseString(s)

与える：

[[[['a', 'b'], 'c'], ['d', 'e']]]

演算の優先順位を認識する中置記法は、かなり一般的なパーサー、またはより大きなパーサーの一部であるため、pyparsingには、operatorPrecedenceすべてのネスト/グループ化/再帰などを処理するためのヘルパー組み込み呼び出しが含まれoperatorPrecedenceます。

from pyparsing import operatorPrecedence, opAssoc, Word, alphas, Suppress

# define an infix notation with precedence of operations
# you only define one operation '+', so this is a simple case
operand = Word(alphas)
expr = operatorPrecedence(operand,
    [
    ('+', 2, opAssoc.LEFT),
    ])

print expr.parseString(s)

以前と同じ結果が得られます。

より詳細な例は、pyparsingwikiでオンラインで見つけることができます-fourFn.pyでの明示的な実装とsimpleArith.py でのoperatorPrecedenceの実装。

score 2 · Accepted Answer

クラスが記述されている ast モジュールhereのドキュメントを見てください。NodeVisitor

import ast
import sys

class MyNodeVisitor(ast.NodeVisitor):
    op_dict = {
        ast.Add : '+',
        ast.Sub : '-',
        ast.Mult : '*',
    }

    type_dict = {
        ast.BinOp: lambda s, n: s.handleBinOp(n),
        ast.Name: lambda s, n: getattr(n, 'id'),
        ast.Num: lambda s, n: getattr(n, 'n'),
    }

    def __init__(self, *args, **kwargs):
        ast.NodeVisitor.__init__(self, *args, **kwargs)
        self.ast = []

    def handleBinOp(self, node):
        return (self.op_dict[type(node.op)], self.handleNode(node.left), 
                    self.handleNode(node.right))

    def handleNode(self, node):
        value = self.type_dict.get(type(node), None)
        return value(self, node)

    def visit_BinOp(self, node):
        op = self.handleBinOp(node)
        self.ast.append(op)

    def visit_Name(self, node):
        self.ast.append(node.id)

    def visit_Num(self, node):
        self.ast.append(node.n)

    def currentTree(self):
        return reversed(self.ast)

a = ast.parse(sys.argv[1])
visitor = MyNodeVisitor()
visitor.visit(a)
print list(visitor.currentTree())

次のようになります。

 $ ./ast_tree.py "5 + (1 + 2) * 3"
 [('+', 5, ('*', ('+', 1, 2), 3))]

楽しみ。

score 2 · Accepted Answer

コリーンのコメントは、次のようなもので実現できます。

str = "(((a+b)+c)+(d+e))"


replacements = [
    ('(','['),
    (')',']'),
    ('+',','),
    # If a,b,c,d,e are defined variables, you don't need the following 5 lines
    ('a',"'a'"),
    ('b',"'b'"),
    ('c',"'c'"),
    ('d',"'d'"),
    ('e',"'e'"),
]

for (f,s) in replacements:
    str = str.replace(f,s)

obj = eval(str)

print(str)       # [[['a','b'],'c'],['d','e']]
print(obj)       # [[['a', 'b'], 'c'], ['d', 'e']]
# You can access the parsed elements as you would any iterable:
print(obj[0])    # [['a', 'b'], 'c']
print(obj[1])    # ['d', 'e']
print(obj[1][0]) # d

score 0 · Accepted Answer

私も翻訳機を作ります。ast 経由で行うのは、この目的のために実装するのが少し面倒でした。

[tw-172-25-24-198 ~]$ cat a1.py 
import re

def multiple_replace(text, adict):
    rx = re.compile('|'.join(map(re.escape, adict)))
    def one_xlat(match):
        return adict[match.group(0)]
    return rx.sub(one_xlat, text)

# Closure based approach
def make_xlat(*args, **kwds):
    adict = dict(*args, **kwds)
    rx = re.compile('|'.join(map(re.escape, adict)))
    def one_xlat(match):
        return adict[match.group(0)]
    def xlat(text):
        return rx.sub(one_xlat, text)
    return xlat

if __name__ == "__main__":
    text = "((a+b)+c+(d+(e+f)))"
    adict = {
        "+":",",
        "(":"[",
        ")":"]",
    }
    translate = make_xlat(adict)
    print translate(text)

与えるべき

[[a,b],c,[d,[e,f]]]

注 - コレクションにこのスニペットがあります。Pythonクックブックからです。1 回のパスでディクショナリ内の置換キーと値を使用して、文字列に対して複数の置換を行います。

score 0 · Accepted Answer

これは単純な問題なので、ゼロからソリューションを作成するだけで済みます。これは、すべての変数名が 1 文字の長さであるか、式がトークンのリストに正しく変換されていることを前提としています。すべての括弧が一致していることを確認するためにチェックを入れました。明らかにCustomError、スローしたい例外や実行したいその他のアクションに交換する必要があります。

def expr_to_list(ex):
    tree = []
    stack = [tree]
    for c in ex:
        if c == '(':
            new_node = []
            stack[-1].append(new_node)
            stack.append(new_node)
        elif c == '+' or c == ' ':
            continue
        elif c == ')':
            if stack[-1] == tree:
                raise CustomError('Unmatched Parenthesis')
            stack.pop()
        else:
            stack[-1].append(c)
    if stack[-1] != tree:
        raise CustomError('Unmatched Parenthesis')
    return tree

テスト済み:

>>> expr_to_list('a + (b + c + (x + (y + z) + (d + e)))')
['a', ['b', 'c', ['x', ['y', 'z'], ['d', 'e']]]]

複数文字の変数名の場合、トークン化に正規表現を使用します。

>>> tokens = re.findall('\(|\)|\+|[\w]+', 
                        '(apple + orange + (banana + grapefruit))')
>>> tokens
['(', 'apple', '+', 'orange', '+', '(', 'banana', '+', 'grapefruit', ')', ')']
>>> expr_to_list(tokens)
[['apple', 'orange', ['banana', 'grapefruit']]]

python - 式を解析してリストにする

5 に答える 5

Related

Reference