antlr - 明示的な終了タグなしでテキストを解析します

Question

辞書エントリの解析の問題(以下の例を参照) は、明示的な開始タグと終了タグがないことに基づいていますが、

ある要素の終了タグがすでに次の要素の開始タグになっています
または: 開始タグは構文要素ではありませんが、現在の解析状態です (したがって、入力ストリームで既に「見た」ものに依存します)

例 1、簡単な入力:

wordWithoutSpace [phonetic information]
definition as everything until colon: example sentence until EOF

例 2、複数定義のエントリ:

wordWithoutSpace [phonetic information]
1. first definition until colon: example sentence until second definition
2. second definition until colon: example sentence until EOF

言葉や疑似コードで言うと、次のようになります。

dictionary-entry : 
     word = .+ ' ' // catch everything as word until you see a space
     phon = '[' .+ ']' // then follows phonetic, which is everything in brackets
     (MultipleMeaning | UniqueMeaning)

MultipleMeaning : Int '.' definition= .+ ':' // a MultipleMeaning has a number 
                                             // before the definition

UniqueMeaning : definition= .+ ':'

ゲート付きのレクサーを試してみました(antlr-version: 3.2)

@members {
  int cs = 0; // current state
  }

@lexer::header {
  package main;
  }

Word :
  {cs==0}?=> .+ ' ' {cs=1;}     // in this state everything until 
  ;                             // Space belongs to the Word, now go to Phon-mode

Phon :
  {cs==1}?=> '[' .+ ']' {cs=2;} // everything in brackets is phonetic-information
;                               // after you have seen this go to next state

MultiDef : 
  {cs==2}?=> Int '.' .+ ':' {cs=3;}
  ;

Def : 
  {cs==2}?=> .+ ':' {cs=3;}
  ;

fragment
Digit :
  '0'..'9';

Int :
  Digit Digit*;

テストレクサー:

import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.Token;

public class TestLexer {


    public static void main(String[] args) {


        String str = "Word [phon]1.definition:";
        CharStream input = new ANTLRStringStream(str);
        DudenLexer lexer = new DudenLexer(input);
        Token token;
        while ((token = lexer.nextToken())!=Token.EOF_TOKEN) {
          System.out.println("Token: "+token);
        }
    }
}

私が抱えている問題：

エラーメッセージが表示されます: line 1:0 rule Def failed predicate: {cs==2}?
これが正しい方法であるかどうかわかりませんか？

私はこれで約3日間立ち往生しており、助けとヒントに非常に感謝しています.

ありがとう、トム

antlr - 明示的な終了タグなしでテキストを解析します

0 に答える 0

Related

Reference