antlr - さまざまな単語タイプの文を解析する

Question

2種類の文を分析するための文法を探しています。つまり、空白で区切られた単語を意味します。

ID1：数字で始まらない単語を含む文
ID2：数字と数字で始まらない単語を含む文

基本的に、文法の構造は次のようになります。

ID1 separator ID2  

ID1: Word can contain number like Var1234 but not start with a number  

ID2: Same as above but 1234 is allowed  

separator: e. g. '='

@Bart
2つのトークンを追加し、後でlexer-ruleで使用するためにlexer-ruleとして追加しようと'_'しまし'"'た。次の文法で使用していなくても、ANTLRWorks 1.4.2で次のエラーが発生します。前のトークンが同じ入力に一致するため、次のトークン定義を一致させることはできません。そのエラー。なんで？ SpecialWordSpecial

fragmentSpecial

grammar Sentence1b1;

tokens
{
  TCUnderscore  = '_' ;
  TCQuote       = '"' ;
}

assignment
  :  id1 '=' id2
  ;

id1
  :  Word+
  ;

id2
  :  ( Word | Int )+
  ;

Int
  :  Digit+
  ;

// A word must start with a letter
Word
  :  ( 'a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | Digit )*
  ;

Special
  : ( TCUnderscore | TCQuote )
  ;

Space
  :  ( ' ' | '\t' | '\r' | '\n' ) { $channel = HIDDEN; }
  ;

fragment Digit
  :  '0'..'9'
  ;

次に、 Lexer-ruleSpecialをlexer-ruleで使用しますWord。

Word
  :  ( 'a'..'z' | 'A'..'Z' | Special ) ('a'..'z' | 'A'..'Z' | Special | Digit )*
  ;

score 1 · Accepted Answer

私はこのようなものに行きます：

grammar Sentence;

assignment
  :  id1 '=' id2
  ;

id1
  :  Word+
  ;

id2
  :  (Word | Int)+
  ;

Int
  :  Digit+
  ;

// A word must start with a letter
Word
  :  ('a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | Digit)*
  ;

Space
  :  (' ' | '\t' | '\r' | '\n') {skip();}
  ;

fragment Digit
  :  '0'..'9'
  ;

入力を解析します：

WordにはVar1234のような数字を含めることができますが、数字で始めることはできません=上記と同じですが、1234を使用できます

次のように：

ここに画像の説明を入力してください

編集

レクサールールを適切にまとめるために、tokens { ... }「架空のトークン」（AST作成で使用）の定義にのみ使用するブロックの一部ではなく、すべてを文法の下部に保持します。

// wrong!
Special      : (TCUnderscore | TCQuote);
TCUnderscore : '_';
TCQuote      : '"';

さて、上記のルールでは、レクサーがまたはに遭遇するTCUnderscoreとトークンが作成されるため、トークンになることはできません。またはこの場合：TCQuote_"Special

// wrong!
TCUnderscore : '_';
TCQuote      : '"';
Special      : (TCUnderscore | TCQuote);

レクサーが最初にトークンを作成するため、トークンSpecialを作成することはできません。したがって、エラー：TCUnderscoreTCQuote

The following token definitions can never be matched because prior tokens match the same input: ...

ルールを作成TCUnderscoreしTCQuoteてルールを作成する場合、ルールは他のレクサールールのみを「提供」するfragmentため、この問題は発生しません。fragmentしたがって、これは機能します。

// good!
Special               : (TCUnderscore | TCQuote);
fragment TCUnderscore : '_';
fragment TCQuote      : '"';

また、fragmentそのため、どのパーサールールでもルールが「表示」されることはありません（レクサーがTCUnderscoreまたはTCQuoteトークンを作成することはありません！）。

// wrong!
parse : TCUnderscore;

Special               : (TCUnderscore | TCQuote);
fragment TCUnderscore : '_';
fragment TCQuote      : '"';

score 0 · Accepted Answer

それがあなたのニーズに合っているかどうかはわかりませんが、私の投稿 ANTLRでのBartの助けを借りて-空白の識別子私はこの文法に到達しました：

grammar PropertyAssignment;

assignment
    : id_nodigitstart '=' id_digitstart EOF
    ;

id_nodigitstart
    :   ID_NODIGITSTART+
    ;

id_digitstart
    :   (ID_DIGITSTART|ID_NODIGITSTART)+
    ;

ID_NODIGITSTART
    :   ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9')*
    ;           

ID_DIGITSTART
    :   ('0'..'9'|'a'..'z'|'A'..'Z')+
    ;

WS  :   (' ')+ {skip();}
    ;

「aname=my 4value」は機能しますが、「4a name=my4value」は例外を引き起こします。

antlr - さまざまな単語タイプの文を解析する

2 に答える 2

編集

Related

Reference