python - Python Leplを使用してドメイン固有言語のサブセットを解析するにはどうすればよいですか？

Question

私はLeplをパーサーとして使用していますが、解析している言語は非常に複雑で、小さなサブセットしか気にしません。Leplに、気になる文法を解析させて、他のすべての文字列を返す方法がわかりません。次のようなルールを追加すると、次のようになります。

everything_else = ~newline & Regexp('.')[:]

そうすれば、気になるものの代わりに使われるようになります。他のルールよりもマッチが長いので起こっていると思います。不完全なパーサーを使用できるように、Leplに構成設定などがありますか？

要求に応じて更新 し、詳細を追加します。数値に等しい最上位の変数定義のみを解析したい。他人に依存しているものや、無視したい数式です。また、ブロック定義内にあるものを無視したい言語には無視したい他の多くの構造があります。だからここに例があります：

from lepl import *

class Variable(List): pass
import string

def parse_it(a_string):

    # Parser:  TODO: incomplete
    s = ~Space()[:] # zero or more spaces
    s1 = ~Space()[1:]  # 1 or more spaces
    newline = Newline() & s
    number_squote = ~Optional(Literal("'")) & s & Real() & s & ~Optional(Literal("'"))
    number_dquote = ~Optional(Literal('"')) & s & Real() & s & ~Optional(Literal('"'))
    number = number_squote | number_dquote | Real() >> float
    var_keyword = ~newline & ~Regexp(r'(?i)variable')
    var_name = Word() >> string.lower
    var_assignment = s1 & var_name & s & ~Literal('=') & s & number > Variable
    vars = var_keyword & var_assignment[1:]
    parser = vars[1:]
    return parser.parse(a_string)

input="""
VARIABLE abc=5 bbb='7' ddd='abc*bbb'
variable ccccc=7  // comment
block(1,2,3,4) of_type=cleaner abc=4 d=5 c=string('hi')

define_block block2 (3,4,5,6,7,a,b) var1=35 var2=36
variable ignore_this=5
block3(3,4,5,6) x='var1*ignore_this' y=var2
block4(4,5,6,7,a,b) x='var1*2' y="var2*3"
end_block

block2(1,2,3,4,5,6,3) abc=ccccc d=abc 

create_blocks  // comment: initialize memory
connect_blocks // connect blocks together
simulate // 

"""
for i in parse_it(input):
    print i

variable Word() = Real()したがって、私はブロック定義の外部で定義されたファイル内の情報のみを本当に気にします。ASTを作成して変数値を変更してから、制御ファイルを再度書き出すことができるように、残りを文字列として保持したいと思います。

score 1 · Accepted Answer

so, if i understand correctly, you want to parse any line that starts with "variable" (ignoring case) and that is not inside a block.

the first thing we need to worry about is how much we need to understand about the bits we want to skip. for example, we could skip everything between define_block and end_block, but what if the text "end_block" happens to appear in some string? maybe to handle that case we also need to be aware of strings? or comments? these kind of worries are why often it is not as easy as you might think to simply skip text - it turns out that to understand what we can skip we actually do need to parse the data.

but perhaps in this case we are ok. it looks like you have neither multi-line strings not multi-line comments, and that define_block and end_block always occur at the start of a line. that gives us enough guarantees (i think) to be able to drop blocks without worrying about strings or comments (because a string or comment would start with // or " or similar, and so a misleading //define_block or "define_block" would not be at the start of the line).

we can do that outside of lepl:

block = re.compile(r'^\s*define_block.*?^\s*end_block[^$]*', re.I | re.M | re.S)
input = block.sub('', input)
for line in input.split('\n'):
    if line.lower().startswith('variable'):
        print line

or as a regexp inside:

block = Regexp(r'(?ims)^\s*define_block.*?^\s*end_block[^$]*')

so your final solution will be something line

variable = ...
other_line = Regexp(r'^.*$')
parser = (variable | block | other_line)[:]

hope that helps.

and finally, full disclosure, i should also point you to https://groups.google.com/group/lepl/browse_thread/thread/e305b5b559d93e9e which i posted today (sorry).

python - Python Leplを使用してドメイン固有言語のサブセットを解析するにはどうすればよいですか？

1 に答える 1

Related

Reference