c# - パーサー（lex / yacc）を作成する方法は？

Question

次のファイルがあり、解析する必要があります

--TestFile
Start ASDF123
Name "John"
Address "#6,US" 
end ASDF123

で始まる行--はコメント行として扱われます。ファイルは「開始」で始まり、で終わりendます。後の文字列StartはtheUserIDとthenであり、nameandaddressは二重引用符の中にあります。

ファイルを解析し、解析したデータをxmlファイルに書き込む必要があります。

したがって、結果のファイルは次のようになります

<ASDF123>
  <Name Value="John" />
  <Address Value="#6,US" />
</ASDF123>

現在、パターンマッチング（Regular Expressions）を使用して上記のファイルを解析しています。これが私のサンプルコードです。

    /// <summary>
    /// To Store the row data from the file
    /// </summary>
    List<String> MyList = new List<String>();

    String strName = "";
    String strAddress = "";
    String strInfo = "";

メソッド：ReadFile

    /// <summary>
    /// To read the file into a List
    /// </summary>
    private void ReadFile()
    {
        StreamReader Reader = new StreamReader(Application.StartupPath + "\\TestFile.txt");
        while (!Reader.EndOfStream)
        {
            MyList.Add(Reader.ReadLine());
        }
        Reader.Close();
    }

メソッド：FormateRowData

    /// <summary>
    /// To remove comments 
    /// </summary>
    private void FormateRowData()
    {
        MyList = MyList.Where(X => X != "").Where(X => X.StartsWith("--")==false ).ToList();
    }

メソッド：ParseData

    /// <summary>
    /// To Parse the data from the List
    /// </summary>
    private void ParseData()
    {
        Match l_mMatch;
        Regex RegData = new Regex("start[ \t\r\n]*(?<Data>[a-z0-9]*)", RegexOptions.IgnoreCase);
        Regex RegName = new Regex("name [ \t\r\n]*\"(?<Name>[a-z]*)\"", RegexOptions.IgnoreCase);
        Regex RegAddress = new Regex("address [ \t\r\n]*\"(?<Address>[a-z0-9 #,]*)\"", RegexOptions.IgnoreCase);
        for (int Index = 0; Index < MyList.Count; Index++)
        {
            l_mMatch = RegData.Match(MyList[Index]);
            if (l_mMatch.Success)
                strInfo = l_mMatch.Groups["Data"].Value;
            l_mMatch = RegName.Match(MyList[Index]);
            if (l_mMatch.Success)
                strName = l_mMatch.Groups["Name"].Value;
            l_mMatch = RegAddress.Match(MyList[Index]);
            if (l_mMatch.Success)
                strAddress = l_mMatch.Groups["Address"].Value;
        }
    }

メソッド：WriteFile

    /// <summary>
    /// To write parsed information into file.
    /// </summary>
    private void WriteFile()
    {
        XDocument XD = new XDocument(
                           new XElement(strInfo,
                                         new XElement("Name",
                                             new XAttribute("Value", strName)),
                                         new XElement("Address",
                                             new XAttribute("Value", strAddress))));
        XD.Save(Application.StartupPath + "\\File.xml");
    }

ParserGeneratorについて聞いたことがあります

lexとyaccを使用してパーサーを作成するのを手伝ってください。この理由は、私の既存のパーサー（Pattern Matching）は柔軟性がなく、それ以上に正しい方法ではないからです（私はそう思います）。

使用方法（コードプロジェクトサンプル1とコードプロジェクトサンプル2ParserGeneratorを読みましたが、まだこれに精通していません）。C＃パーサーを出力するパーサージェネレーターを教えてください。

score 5 · Accepted Answer

Gardens Point LEXとGardens Point Parser Generatorは、LEX と YACC の影響を強く受けており、C# コードを出力します。

あなたの文法は単純なので、現在のアプローチは問題ないと思いますが、それを行う「本当の」方法を学びたいと思っていることを称賛します. :-) 文法に関する私の提案は次のとおりです (生成規則のみです。これは完全な例とはほど遠いものです。実際の GPPG ファイルは...、構文ツリーを構築するために C# コードで置き換える必要があり、トークン宣言などが必要です。- 読むドキュメントの GPPG の例と、トークンを記述する GPLEX ファイルも必要です):

/* Your input file is a list of "top level elements" */
TopLevel : 
    TopLevel TopLevelElement { ... }
    | /* (empty) */

/* A top level element is either a comment or a block. 
   The COMMENT token must be described in the GPLEX file as 
   any line that starts with -- . */
TopLevelElement:
    Block { ... }
    | COMMENT { ... }

/* A block starts with the token START (which, in the GPLEX file, 
   is defined as the string "Start"), continues with some identifier 
   (the block name), then has a list of elements, and finally the token
   END followed by an identifier. If you want to validate that the
   END identifier is the same as the START identifier, you can do that
   in the C# code that analyses the syntax tree built by GPPG.
   The token Identifier is also defined with a regular expression in GPLEX. */
Block:
    START Identifier BlockElementList END Identifier { ... }

BlockElementList:
    BlockElementList BlockElement { ... }
    | /* empty */

BlockElement:
    (NAME | ADDRESS) QuotedString { ... }

score 1 · Accepted Answer

まず、パーサーの文法を定義する必要があります。(yacc部)

次のようになります。

file : record file
     ;

record: start identifier recordContent end identifier {//rule to match the two identifiers}
      ;

recordContent: name value; //Can be more detailed if you require order in the fields

字句解析は lex で実行されます。そして、あなたの正規表現はそれらを定義するのに役立つと思います。

私の答えは下書きです。インターネットで lex/yacc flex/bison に関するより完全なチュートリアルを見つけて、もっと焦点を絞った問題がある場合はここに戻ってくることをお勧めします。

また、マネージコードを保持できる C# 実装があるかどうかもわかりません。アンマネージ C / C++ インポートを使用する必要がある場合があります。

c# - パーサー（lex / yacc）を作成する方法は？

2 に答える 2

Related

Reference