c# - C＃は、括弧付きの化学式を解析します

Question

文字列入力から化学式を抽出する C# 化学式パーサーを作成しようとしています。H2O などの括弧を含まない化学式でこれを行う方法を見つけました。しかし、Al2(HPO4)3 のような式で括弧を使用してこれを機能させる方法がわかりません。

注意点ですが、これにより、要素 (文字列) と数値の 2 つの変数を持つ" FormulaComponents " と呼ばれるクラスのリストが出力されます。

何か案は？

編集：これが私の現在の試みです。括弧以外はすべて処理します。

public static Formula Parse(string input)
{
    var components = new List<FormulaComponent>();

    const string elementRegex = "([A-Z][a-z]*)([0-9]*)";
    const string validateRegex = "^(" + elementRegex + ")+$";

    if (!Regex.IsMatch(input, validateRegex))
        throw new FormatException("Input string was in an incorrect format.");

    foreach (Match match in Regex.Matches(input, elementRegex))
    {
        var name = match.Groups[1].Value;

        var count = match.Groups[2].Value != "" ?
            int.Parse(match.Groups[2].Value) :
            1;

        if (ElementManager.FindElementBySymbol(name) == null)
            throw new FormatException(name + " is not recognized as a valid element symbol.");

        components.Add(new FormulaComponent { Element = ElementManager.FindElementBySymbol(name), Quantity = count });
    }

    return new Formula { Components = components };
}

score 1 · Accepted Answer

やり過ぎかもしれませんが、少なくともクリーンです。lexer+parser を使用して作業を行うことができます。

レクサー規則:

/[A-Z][a-z]*/ -> ATOM;
/[0-9]+/ -> NUM, Convert.ToInt32($text);
"(" -> LPAREN;
")" -> RPAREN;

そしてパーサー規則:

s -> c:comp { c };

atom -> a:ATOM { new Atom(a,1) }
      | a:ATOM n:NUM { new Atom(a,n) }
      ;

comp -> LPAREN c:comp RPAREN n:NUM { new Compound(c,n) }
      | c:comp+ { new Compounds(c) }
      | a:atom { a }
      ;

これらは単なるルールです (ここでは何もテストしていません)。必要に応じて、私のNLT lexer+parserを使用できますが、C# 用のツールは他にもたくさんあります。お気に入りを選んでください。

ネストされた括弧がないため、正規表現の方が簡単かもしれません。

score 1 · Accepted Answer

Formulaクラスがどのように見えるかわからないので、結果をMessageBoxに入れました

    public static Double getElements(String _molecule)
    {
        Boolean useParenthesis = Regex.IsMatch(_molecule, @"[A-Z][a-z]?\d*\((([A-Z][a-z]?\d*){1,2})\)\d*");
        var findMatches = Regex.Matches(_molecule, @"\(?[A-Z][a-z]?\d*\)?"); // Get all elements
        if (useParenthesis)
        {
            Double endNumber = Double.Parse(Regex.IsMatch(_molecule, @"\)\d+") ? Regex.Match(_molecule, @"\)\d+").Value.Remove(0, 1) : "1"); // Finds the number after the ')'
            foreach (Match i in findMatches)
            {
                String element = Regex.Match(i.Value, "[A-Z][a-z]?").Value; // Gets the element
                Double amountOfElement = 0;
                if (Regex.IsMatch(i.Value, @"[\(\)]"))
                {
                    if (!Double.TryParse(Regex.Replace(i.Value, @"(\(|\)|[A-Z]|[a-z])", ""), out amountOfElement))
                        amountOfElement = endNumber; // If the element has either '(' or ')' and doesn't specify an amount, then set it equal to the endnumber
                    else
                        amountOfElement *= endNumber; // If the element has either '(' or ')' and specifies an amount, then multiply it by the end number
                }
                else
                    amountOfElement = Double.Parse(String.IsNullOrWhiteSpace(i.Value.Replace(element, "")) ? "1" : i.Value.Replace(element, ""));
                MessageBox.Show(element + " - " + amountOfElement);
            }
            return endNumber;
        }
        else
            return 0.0;
    }

c# - C＃は、括弧付きの化学式を解析します

3 に答える 3

Related

Reference