parsing - 特別な場合に式をさらに解析します

Question

現時点では、私のフロントエンドは、、、、などの通常の式を解析できます123。以下は関連するコードです。"abcd""=123""=TRUE+123"

(* in `syntax.ml`: *)
and expression =
  | E_integer of int  
  | E_string of string

(* in `parser.mly`: *)
expression:
  | INTEGER { E_integer $1 }
  | STRING { E_string $1 }

ここで、パーサーを改良して、で始まる文字列に出会ったときに、=それをリテラル文字列ではなく数式として評価しようとします。したがってsyntax.ml、次のようになります。

(* in `syntax.ml`: *)
and expression =
  | E_integer of int  
  | E_string of string
  | E_formula of formula

and formula =
  | F_integer of int  
  | F_boolean of bool
  | F_Add of formula * formula

問題は、変更方法がわからないことです。parser.mlyこれを試しましたが、機能しませんでした（This expression has type string but an expression was expected of type Syntax.formula）：

(* in `parser.mly`: *)
expression:
  | INTEGER { E_integer $1 }
  | STRING { 
    if String.sub $1 1 1 <> "="
    then E_string $1
    else E_formula (String.sub $1 2 ((String.length $1) - 1)) }

パーサーに通知する方法がわかりません。で始まる文字列の=場合、次のルールに基づいてさらに解析する必要がありますformula...誰か助けてもらえますか？

のコメントに続いてgasche：

フォーミュラ用のパーサーが必要であることに同意します。ここで問題となるのはlexer.mll、数式用に別のものが必要かどうかです。プログラム全体を一度だけlexするのは論理なので、私はそうしないことを望みますね。また、数式の文法を既存のものに直接追加することはできparser.mlyますか？

現在lexer.mll、私は持っています：

let STRING = double_quote ([^ '\x0D' '\x0A' '\x22'])* double_quote
rule token = parse
  | STRING as s { STRING s }

私はここで直接何かをすることができると思います：

let STRING = double_quote ([^ '\x0D' '\x0A' '\x22'])* double_quote    
let FORMULA_STRING = double_quote = ([^ '\x0D' '\x0A' '\x22'])* double_quote
rule token = parse
  | FORMULA_STRING as fs { XXXXX }
  | STRING as s { STRING s }

別々に持っている場合、その場所に何を書くべきかわからないのですが、そうXXXXXすべきですか？数式を含むすべての文法を含むものだけを持っている場合はどうなりますか？Parser_formula.formula token fsparser_formula.mlyparser.mly

score 4 · Accepted Answer

The problem is with your line

    else E_formula (String.sub $1 2 ((String.length $1) - 1))

Instead of (String.sub ...), which has type string, you should return a value of type Syntax.formula. If you had a parse_formula : string -> Syntax.formula function you could here write

    else E_formula (parse_formula (String.sub $1 2 ((String.length $1) - 1)))

I think you could define such a function by defining the formula grammar as a separate parser first.

Edit: following you own edit:

if you go the route of calling a different parser for formulas, you don't need to define a different lexer
if you choose to handle the distinction between strings and formulas at the lexer level (are you sure that's correct? what about real string that would begin with '='?), then you don't need to have a separate parser for formulas, you can have them as rules in your current grammar. But to do that you need your lexer to behave in a more fine-grained ways on formulas: instead of just recognizing "=.*" as a single token, you should recognize "= as a beginning-of-formula, and lex the rest of the formula until you encounter the closing ". To avoid conflicts you may want to handle simple strings with a lexing rule rather than a simple regexp as well.

If you can get the second approach to work, I think it is indeed a simpler idea.

PS: please use menhir variable naming facilities instead of $1 as soon as the variables are not consecutive (because of intermediary terminals) or you need to repeat it more than once.

score 2 · Accepted Answer

Continuing on @gasche 's answer.

You want to include new syntactic rules in your parser, which means that you need to change the grammar rules in parser.mly to accomodate these new rules.

The String.sub approach is somewhat in the right direction, but you are actually doing by hand what the mly file could let you automate.

Consider your formula type: the F_Add datatype there let you encode a binary sum formula, thus containing 2 formulas. In the mly file, you could describe it as:

formula:
   INTEGER                              { F_Integer $1 }
  | BOOL                                   { F_Bool $1 }
  | formula PLUS formula   { F_Add ($1, $3) }
;

Note how the grammar rule definition mirrors the formula type definition. As you can see, the recursive property of formulas is nicely handled by the grammar rule for you.

Concerning lexer.mll, the regular expressions STRING and FORMULA_STRING are exactly the same. If you use them both in the same lexer rule (as in your code snippet), it will not work as you expect it to. The lexer has no knowledge of what is going on in the parser, it cannot choose to provide a STRING or a FORMULA_STRING when it's convenient for the parser to fill a specific rule in. With ocamlyacc (and with the tools it drew inspiration from), it works the other way round: the parser receives tokens which the lexer has recognized from the text stream, and tries to find the rule which correspond to them, according to what he has already figured out before.

Note that the BOOL terminal must be regonized by _lexer.mll(just likeINTEGER`), so you will need to amend it with the proper regular expression.

Also, you should ask yourself the following questions: in the =5 formula, isn't there somewhere an expression waiting to be discovered?

If so, could you reformulate the definition of a formula in terms of expressions and new tokens?

parsing - 特別な場合に式をさらに解析します

2 に答える 2

Related

Reference