parsing - 「?」の実装 (lazy "") コンビナトリアル GLR パーサーの正規表現パターン

Question

組み合わせ GLR パーサーを実装しました。その中には次のものがあります。

char(·)指定された文字または文字範囲を消費するパーサー。
many(·)指定されたパーサーをゼロから無限に繰り返すコンビネーター。

例:は、任意の数の-s"char('a').many()"を含む文字列に一致します。"a"

しかし、many(·)コンビネーターは貪欲であるため、たとえば、char('{') >> char('{') >> char('a'..'z').many() >> char('}') >> char('}')(">>"パーサーのシーケンシャルチェーンはどこにありますか) は文字列全体を正常に消費し"{{foo}}some{{bar}}"ます。

many(·)前の例で使用されている、消費"{{foo}}"のみの遅延バージョンを実装したいと思います。どうやってやるの？

編集：

私はあなたを混乱させたかもしれません。私のプログラムでは、パーサーは「ステップ」を受け入れて「ステップ」の森を返す関数 (または C++ の用語では「ファンクター」) です。「ステップ」には、OK タイプ (パーサーが入力の一部を正常に消費したことを意味する) と FAIL タイプ (パーサーがエラーに遭遇したことを意味する) があります。より多くの種類のステップがありますが、それらは補助的なものです。

Parser = f(Step) -> Collection of TreeNodes of Steps.

したがって、入力を解析すると、次のようになります。

単純な事前定義されたパーサー関数を構成して、必要な文法を表す複雑なパーサー関数を取得します。
入力から最初のステップを形成します。
複雑なパーサー関数に初期ステップを与えます。
Steps で TreeNodes をフィルタリングし、OK のものだけを残します (または、入力にエラーがあった場合は最小限の FAIL-s で)。
残されたステップから情報を集める。

score 4 · Accepted Answer

I have implemented and have been using GLR parsers for 15 years as language front ends for a program transformation system.

I don't know what a "combinatorial GLR parser" is, and I'm unfamiliar with your notation so I'm not quite sure how to interpret it. I assume this is some kind of curried function notation? I'm imagining your combinator rules are equivalent to definining a grammer in terms of terminal characters, where "char('a').many" corresponds to grammar rules:

 char = "a" ;
 char = char "a" ;

GLR parsers, indeed, produce all possible parses. The key insight to GLR parsing is its psuedo-parallel processing of all possible parses. If your "combinators" can propose multiple parses (that is, they produce grammar rules sort of equivalent to the above), and you indeed have them connected to a GLR parser, they will all get tried, and only those sequences of productions that tile the text will survive (meaning all valid parsess, e.g., ambiguous parses) will survive.

If you have indeed implemented a GLR parser, this collection of all possible parses should have been extremely clear to you. The fact that it is not hints what you have implemented is not a GLR parser.

Error recovery with a GLR parser is possible, just as with any other parsing technology. What we do is keep the set of live parses before the point of the error; when an error is found, we try (in psuedo-parallel, the GLR parsing machinery makes this easy if it it bent properly) all the following: a) deleting the offending token, b) inserting all tokens that essentially are FOLLOW(x) where x is live parse. In essence, delete the token, or insert one expected by a live parse. We then turn the GLR parser loose again. Only the valid parses (e.g., repairs) will survive. If the current token cannot be processed, the parser processing the stream with the token deleted survives. In the worst case, the GLR parser error recovery ends up throwing away all tokens to EOF. A serious downside to this is the GLR parser's running time grows pretty radically while parsing errors; if there are many in one place, the error recovery time can go through the roof.

score 1 · Accepted Answer

GLR パーサーは入力のすべての可能な解析を生成しませんか? 次に、あいまいさを解決することは、好みの解析を選択することです。そのためには、パースフォレストの要素に、生成されたコンビネータの種類 (熱心か怠惰か) に応じてラベルを付ける必要があると思います。(一般に、すべての入力を確認する前にあいまいさを段階的に解決することはできません。)

(この回答は、私のぼんやりとした記憶と、GLR 解析の漠然とした誤解に基づいています。専門家が来ることを願っています。)

score 0 · Accepted Answer

正規表現<.*?>と入力を考えてみましょう<a>bc<d>ef。これでが見つかり<a>、他に一致するものはありませんよね?

<.*?>e次に、同じ入力を持つ正規表現を考えてみましょう。これでが見つかるはず<a>bc<d>eですよね？

これはジレンマを引き起こします。ユーザーのために、コンビネータの動作を>>2 つのオペランドの観点から理解してもらいたいと考えています。しかし、最初のパーサーが見つけたものに関して、2 番目のパーサーの動作を生成する方法はありません。

1 つの答えは、各パーサーが、すべてのパーサーの順序付けられていないセットではなく、優先順に並べられたすべてのパースのシーケンスを生成することです。貪欲な一致は、最長から最短にソートされた一致を返します。貪欲ではなく、最短から最長まで。

score 0 · Accepted Answer

貪欲でない機能は、あいまいさを解消するメカニズムにすぎません。本当に一般化されたパーサー (結果を生成するために明確化を必要としないパーサー) を使用している場合、「非貪欲」は無意味です。演算子が「貪欲でない」かどうかにかかわらず、同じ結果が返されます。

一般化されたパーサーによって提供される結果の完全なセットに、貪欲でない明確化動作を適用できます。左から右に作業して、貪欲でない演算子に対応するあいまいなサブグループをフィルタリングして、残りの入力の解析に成功した最短の一致を使用します。

parsing - 「*?」の実装 (lazy "*") コンビナトリアル GLR パーサーの正規表現パターン

4 に答える 4

Related

Reference

parsing - 「?」の実装 (lazy "") コンビナトリアル GLR パーサーの正規表現パターン