python - Python の PLY ライブラリでレキシングルールを変更する

Question

コンパイラのクラスで、講師は、これから実装する言語の文法ではパーサーで先読みが必要になると教えてくれました。flex のようなツールを使用すると、で簡単に実行できますfoo/x。

私は現在、Python がプロジェクトに適しているかどうかを確認するために、PLY ライブラリを使用して Python でサンプルプログラムを実行しようとしています。FORTRAN の do ループの単純なバージョンを実装しようとしています。

-- Spaces are ignored in FORTRAN
DO 5 I=1,10   -- Loop
DO 5 I=1.10   -- Assignment (DO5I = 1.10)

現在、私の考えは、DOキーワードに一致し、残りの入力がループに一致するかどうかを確認することです。存在する場合は、DOトークンを返します。それ以外の場合は、入力を「巻き戻して」識別子ルールに進みたいと思います。何かのようなもの：

def t_do(t):
    'do'
    if re.match(do_loop_regex, t.lexer.lexdata[t.lexer.lexpos:]):
        return t
    else:
        t.rewind() # this is what I need to figure out
        return t_identifier(t)

def t_identifier(t):
    '[A-Z_][A-Z0-9_]*'
    return t

score 1 · Accepted Answer

Ply では可能ですが、ある程度のデータマッサージとトークンの構築が必要です。

import re
from ply import lex

tokens = ('LOOP','ASSIGNMENT')
literals = '=,'

re_float = r'(\d+\.\d+)'
re_int = r'(\d+)'
re_ident = r'([A-Za-z]\w*)'
re_expr = '(%s)' % '|'.join([re_float, re_int, re_ident])

re_loop = 'DO%s%s=%s,%s' % (re_int, re_ident, re_expr, re_expr)
@lex.TOKEN(re_loop)
def t_LOOP(t):
    return t

re_assignment = '%s=%s' % (re_ident, re_expr)
@lex.TOKEN(re_assignment)
def t_ASSIGNMENT(t):
    return t

def t_newline(t):
    r'\n+'
    t.lineno += len(t.value)    # count newlines

def t_error(t):
    print "syntax error at %s, line# %d" % (t.value, t.lineno)

DATA = """-- Spaces are ignored in FORTRAN
DO 5 I=1,10   -- Loop
DO 5 I=1.10   -- Assignment (DO5I = 1.10)"""

def preprocess(data):
    re_spaces=re.compile('\s*')
    re_comment=re.compile('--.*$')
    lines = []
    for line in data.split('\n'):       # split into lines
        line = re_spaces.sub('', line)
        line = re_comment.sub('', line)
        if not line: continue           # skip blank lines
        line = line.upper()
        lines.append(line)
    return '\n'.join(lines)+'\n'

print re_assignment
lexer = lex.lex()
lexer.input(preprocess(DATA))
while True:
    tok = lexer.token()
    if not tok: break
    print tok

最初に、手動でコメント、スペース、および強制的に大文字を削除する必要があります。次に、基本的に文法をインラインで実装するパーサーを作成します。最終的に手に負えなくなると思います。私だったら、行ごとの正規表現マッチングだけを行う独自のレクサーを実装する方がはるかに簡単だと思います。Ply のレクサーが実際に行うことは、すべての小さな正規表現を使用して 1 つの巨大な正規表現を作成し、その後、トークンを段階的に照合することだけです。

python - Python の PLY ライブラリでレキシング ルールを変更する

1 に答える 1

Related

Reference

python - Python の PLY ライブラリでレキシングルールを変更する