これは a とはかけ離れてRegex
いますが、ネストされた関数の可能性と、これが構造化言語であり、レクサー/パーサー スキームがより適切であるという事実を組み合わせたものです。
これは、この性質のものを処理するシステムの例です
まず、入力に配置できるもの (変更する式) を定義します。
public interface ISourcePart
{
/// <summary>
/// Gets the string representation of the kind of thing we're working with
/// </summary>
string Kind { get; }
/// <summary>
/// Gets the position this information is found at in the original source
/// </summary>
int Position { get; }
/// <summary>
/// Gets a representation of this data as Token objects
/// </summary>
/// <returns>An array of Token objects representing the data</returns>
Token[] AsTokens();
}
次に、トークン (ソース テキストの識別可能な部分) を格納するための構造を定義します。
public class Token : ISourcePart
{
public int Position { get; set; }
public Token[] AsTokens()
{
return new[] {this};
}
public string Kind { get; set; }
/// <summary>
/// Gets or sets the value of the token
/// </summary>
public string Value { get; set; }
/// <summary>
/// Creates a new Token
/// </summary>
/// <param name="kind">The kind (name) of the token</param>
/// <param name="match">The Match the token is to be generated from</param>
/// <param name="index">The offset from the beginning of the file the index of the match is relative to</param>
/// <returns>The newly created token</returns>
public static Token Create(string kind, Match match, int index)
{
return new Token
{
Position = match.Index + index,
Kind = kind,
Value = match.Value
};
}
/// <summary>
/// Creates a new Token
/// </summary>
/// <param name="kind">The kind (name) of the token</param>
/// <param name="value">The value to assign to the token</param>
/// <param name="position">The absolute position in the source file the value is located at</param>
/// <returns>The newly created token</returns>
public static Token Create(string kind, string value, int position)
{
return new Token
{
Kind = kind,
Value = value,
Position = position
};
}
}
この例では、トークンを見つけるために es を使用Regex
します (以下 - 私のデモ プロジェクトの Program.cs からの抜粋)。
/// <summary>
/// Breaks an input string into recognizable tokens
/// </summary>
/// <param name="source">The input string to break up</param>
/// <returns>The set of tokens located within the string</returns>
static IEnumerable<Token> Tokenize(string source)
{
var tokens = new List<Token>();
var sourceParts = new[] { new KeyValuePair<string, int>(source, 0) };
tokens.AddRange(Tokenize(OpenParen, "\\(", ref sourceParts));
tokens.AddRange(Tokenize(CloseParen, "\\)", ref sourceParts));
tokens.AddRange(Tokenize(Semi, ";", ref sourceParts));
tokens.AddRange(Tokenize(Operator, "[\\^\\\\*\\+\\-/]", ref sourceParts));
tokens.AddRange(Tokenize(Literal, "\\w+", ref sourceParts));
return tokens.OrderBy(x => x.Position);
}
ご覧のとおり、開き括弧と閉じ括弧、セミコロン、基本的な数学演算子、文字と数字のパターンを定義しました。
メソッドは次のTokenize
ように定義されます (デモ プロジェクトの Program.cs から)。
/// <summary>
/// Performs tokenization of a collection of non-tokenized data parts with a specific pattern
/// </summary>
/// <param name="tokenKind">The name to give the located tokens</param>
/// <param name="pattern">The pattern to use to match the tokens</param>
/// <param name="untokenizedParts">The portions of the input that have yet to be tokenized (organized as text vs. position in source)</param>
/// <returns>The set of tokens matching the given pattern located in the untokenized portions of the input, <paramref name="untokenizedParts"/> is updated as a result of this call</returns>
static IEnumerable<Token> Tokenize(string tokenKind, string pattern, ref KeyValuePair<string, int>[] untokenizedParts)
{
//Do a bit of setup
var resultParts = new List<KeyValuePair<string, int>>();
var resultTokens = new List<Token>();
var regex = new Regex(pattern);
//Look through all of our currently untokenized data
foreach (var part in untokenizedParts)
{
//Find all of our available matches
var matches = regex.Matches(part.Key).OfType<Match>().ToList();
//If we don't have any, keep the data as untokenized and move to the next chunk
if (matches.Count == 0)
{
resultParts.Add(part);
continue;
}
//Store the untokenized data in a working copy and save the absolute index it reported itself at in the source file
var workingPart = part.Key;
var index = part.Value;
//Look through each of the matches that were found within this untokenized segment
foreach (var match in matches)
{
//Calculate the effective start of the match within the working copy of the data
var effectiveStart = match.Index - (part.Key.Length - workingPart.Length);
resultTokens.Add(Token.Create(tokenKind, match, part.Value));
//If we didn't match at the beginning, save off the first portion to the set of untokenized data we'll give back
if (effectiveStart > 0)
{
var value = workingPart.Substring(0, effectiveStart);
resultParts.Add(new KeyValuePair<string, int>(value, index));
}
//Get rid of the portion of the working copy we've already used
if (match.Index + match.Length < part.Key.Length)
{
workingPart = workingPart.Substring(effectiveStart + match.Length);
}
else
{
workingPart = string.Empty;
}
//Update the current absolute index in the source file we're reporting to be at
index += effectiveStart + match.Length;
}
//If we've got remaining data in the working copy, add it back to the untokenized data
if (!string.IsNullOrEmpty(workingPart))
{
resultParts.Add(new KeyValuePair<string, int>(workingPart, index));
}
}
//Update the untokenized data to contain what we couldn't process with this pattern
untokenizedParts = resultParts.ToArray();
//Return the tokens we were able to extract
return resultTokens;
}
sin(x)
トークン化されたデータを処理するためのメソッドと型が整ったので、単純な関数 ( など)、複雑な関数 ( などFunction1(a1;a2;a3)
)、基本的な数学演算 ( など)の呼び出しなど、より大きな意味の部分を認識できる必要があります。+
、-
、*
など)、など。これを処理するための簡単なパーサーを作成します。最初に、解析ノードの一致条件を定義します。
public class ParseNodeDefinition
{
/// <summary>
/// The set of parse node definitions that could be transitioned to from this one
/// </summary>
private readonly IList<ParseNodeDefinition> _nextNodeOptions;
/// <summary>
/// Creates a new ParseNodeDefinition
/// </summary>
private ParseNodeDefinition()
{
_nextNodeOptions = new List<ParseNodeDefinition>();
}
/// <summary>
/// Gets whether or not this definition is an acceptable ending point for the parse tree
/// </summary>
public bool IsValidEnd { get; private set; }
/// <summary>
/// Gets the name an item must have for it to be matched by this definition
/// </summary>
public string MatchItemsNamed { get; private set; }
/// <summary>
/// Gets the set of parse node definitions that could be transitioned to from this one
/// </summary>
public IEnumerable<ParseNodeDefinition> NextNodeOptions
{
get { return _nextNodeOptions; }
}
/// <summary>
/// Gets or sets the tag that will be associated with the data if matched
/// </summary>
public string Tag { get; set; }
/// <summary>
/// Creates a new ParseNodeDefinition matching items with the specified name/kind.
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <param name="tag">The tag to associate with matched items</param>
/// <param name="isValidEnd">Whether or not the element is a valid end to the parse tree</param>
/// <returns>A ParseNodeDefinition capable of matching items of the given name</returns>
public static ParseNodeDefinition Create(string matchItemsNamed, string tag, bool isValidEnd)
{
return new ParseNodeDefinition { MatchItemsNamed = matchItemsNamed, Tag = tag, IsValidEnd = isValidEnd };
}
public ParseNodeDefinition AddOption(string matchItemsNamed)
{
return AddOption(matchItemsNamed, string.Empty, false);
}
public ParseNodeDefinition AddOption(string matchItemsNamed, string tag)
{
return AddOption(matchItemsNamed, tag, false);
}
/// <summary>
/// Adds an option for a named node to follow this one in the parse tree the node is a part of
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <param name="tag">The tag to associate with matched items</param>
/// <param name="isValidEnd">Whether or not the element is a valid end to the parse tree</param>
/// <returns>The ParseNodeDefinition that has been added</returns>
public ParseNodeDefinition AddOption(string matchItemsNamed, string tag, bool isValidEnd)
{
var node = Create(matchItemsNamed, tag, isValidEnd);
_nextNodeOptions.Add(node);
return node;
}
public ParseNodeDefinition AddOption(string matchItemsNamed, bool isValidEnd)
{
return AddOption(matchItemsNamed, string.Empty, isValidEnd);
}
/// <summary>
/// Links the given node as an option for a state to follow this one in the parse tree this node is a part of
/// </summary>
/// <param name="next">The node to add as an option</param>
public void LinkTo(ParseNodeDefinition next)
{
_nextNodeOptions.Add(next);
}
}
ParseTree
これにより、単一の要素を名前 (後で定義されているかどうかにかかわらず) またはToken
両方がインターフェイスを実装しているため、一致させることがISourcePart
できます。次に、マッチングParseTreeDefinition
のシーケンスを指定できる を作成します。ParseNodeDefinitions
public class ParseTreeDefinition
{
/// <summary>
/// The set of parse node definitions that constitute an initial match to the parse tree
/// </summary>
private readonly IList<ParseNodeDefinition> _initialNodeOptions;
/// <summary>
/// Creates a new ParseTreeDefinition
/// </summary>
/// <param name="name">The name to give to parse trees generated from full matches</param>
public ParseTreeDefinition(string name)
{
_initialNodeOptions = new List<ParseNodeDefinition>();
Name = name;
}
/// <summary>
/// Gets the set of parse node definitions that constitute an initial match to the parse tree
/// </summary>
public IEnumerable<ParseNodeDefinition> InitialNodeOptions { get { return _initialNodeOptions; } }
/// <summary>
/// Gets the name of the ParseTreeDefinition
/// </summary>
public string Name { get; private set; }
/// <summary>
/// Adds an option for a named node to follow this one in the parse tree the node is a part of
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <returns>The ParseNodeDefinition that has been added</returns>
public ParseNodeDefinition AddOption(string matchItemsNamed)
{
return AddOption(matchItemsNamed, string.Empty, false);
}
/// <summary>
/// Adds an option for a named node to follow this one in the parse tree the node is a part of
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <param name="tag">The tag to associate with matched items</param>
/// <returns>The ParseNodeDefinition that has been added</returns>
public ParseNodeDefinition AddOption(string matchItemsNamed, string tag)
{
return AddOption(matchItemsNamed, tag, false);
}
/// <summary>
/// Adds an option for a named node to follow this one in the parse tree the node is a part of
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <param name="tag">The tag to associate with matched items</param>
/// <param name="isValidEnd">Whether or not the element is a valid end to the parse tree</param>
/// <returns>The ParseNodeDefinition that has been added</returns>
public ParseNodeDefinition AddOption(string matchItemsNamed, string tag, bool isValidEnd)
{
var node = ParseNodeDefinition.Create(matchItemsNamed, tag, isValidEnd);
_initialNodeOptions.Add(node);
return node;
}
/// <summary>
/// Adds an option for a named node to follow this one in the parse tree the node is a part of
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <param name="isValidEnd">Whether or not the element is a valid end to the parse tree</param>
/// <returns>The ParseNodeDefinition that has been added</returns>
public ParseNodeDefinition AddOption(string matchItemsNamed, bool isValidEnd)
{
return AddOption(matchItemsNamed, string.Empty, isValidEnd);
}
/// <summary>
/// Attempts to follow a particular branch in the parse tree from a given starting point in a set of source parts
/// </summary>
/// <param name="parts">The set of source parts to attempt to match in</param>
/// <param name="startIndex">The position to start the matching attempt at</param>
/// <param name="required">The definition that must be matched for the branch to be followed</param>
/// <param name="nodes">The set of nodes that have been matched so far</param>
/// <returns>true if the branch was followed to completion, false otherwise</returns>
private static bool FollowBranch(IList<ISourcePart> parts, int startIndex, ParseNodeDefinition required, ICollection<ParseNode> nodes)
{
if (parts[startIndex].Kind != required.MatchItemsNamed)
{
return false;
}
nodes.Add(new ParseNode(parts[startIndex], required.Tag));
return parts.Count > (startIndex + 1) && required.NextNodeOptions.Any(x => FollowBranch(parts, startIndex + 1, x, nodes)) || required.IsValidEnd;
}
/// <summary>
/// Attempt to match the parse tree definition against a set of source parts
/// </summary>
/// <param name="parts">The source parts to match against</param>
/// <returns>true if the parse tree was matched, false otherwise. parts is updated by this method to consolidate matched nodes into a ParseTree</returns>
public bool Parse(ref IList<ISourcePart> parts)
{
var partsCopy = parts.ToList();
for (var i = 0; i < parts.Count; ++i)
{
var tree = new List<ParseNode>();
if (InitialNodeOptions.Any(x => FollowBranch(partsCopy, i, x, tree)))
{
partsCopy.RemoveRange(i, tree.Count);
partsCopy.Insert(i, new ParseTree(Name, tree.ToArray(), tree[0].Position));
parts = partsCopy;
return true;
}
}
return false;
}
}
もちろん、これまでに定義したマッチャーの結果を保存する場所がなければ、これらはあまり役に立ちません。そのため、 aは単にオブジェクトのコレクションであり、 はor (またはそれ以上)のラッパーであると定義ParseTree
しましょう。一般的に任意)。ParseNode
ParseTree
ParseNode
ParseNode
ParseTree
Token
ISourcePart
public class ParseTree : ISourcePart
{
/// <summary>
/// Creates a new ParseTree
/// </summary>
/// <param name="kind">The kind (name) of tree this is</param>
/// <param name="nodes">The nodes the tree matches</param>
/// <param name="position">The position in the source file this tree is located at</param>
public ParseTree(string kind, IEnumerable<ISourcePart> nodes, int position)
{
Kind = kind;
ParseNodes = nodes.ToList();
Position = position;
}
public string Kind { get; private set; }
public int Position { get; private set; }
/// <summary>
/// Gets the nodes that make up this parse tree
/// </summary>
public IList<ISourcePart> ParseNodes { get; internal set; }
public Token[] AsTokens()
{
return ParseNodes.SelectMany(x => x.AsTokens()).ToArray();
}
}
public class ParseNode : ISourcePart
{
/// <summary>
/// Creates a new ParseNode
/// </summary>
/// <param name="sourcePart">The data that was matched to create this node</param>
/// <param name="tag">The tag data (if any) associated with the node</param>
public ParseNode(ISourcePart sourcePart, string tag)
{
SourcePart = sourcePart;
Tag = tag;
}
public string Kind { get { return SourcePart.Kind; } }
/// <summary>
/// Gets the tag associated with the matched data
/// </summary>
public string Tag { get; private set; }
/// <summary>
/// Gets the data that was matched to create this node
/// </summary>
public ISourcePart SourcePart { get; private set; }
public int Position { get { return SourcePart.Position; } }
public Token[] AsTokens()
{
return SourcePart.AsTokens();
}
}
必要な構成要素はこれで終わりです。構文解析ツリーの定義の構成に移ります。ここから先のコードは、デモの Program.cs からのものです。
上記の各トークンのパターンの宣言に関するブロックでお気づきかもしれませんが、参照されているが定義されていない値がいくつかありました。
private const string CloseParen = "CloseParen";
private const string ComplexFunctionCall = "ComplexFunctionCall";
private const string FunctionCallStart = "FunctionCallStart";
private const string Literal = "Literal";
private const string OpenParen = "OpenParen";
private const string Operator = "Operator";
private const string ParenthesisRequiredElement = "ParenthesisRequiredElement";
private const string ParenthesizedItem = "ParenthesizedItem";
private const string Semi = "Semi";
private const string SimpleFunctionCall = "SimpleFunctionCall";
\w+
開き括弧が後に続くリテラル ( pattern ) に一致するパターンを定義することから始めましょう。sin(
これを使用して、またはのようなものに一致させますFunction1(
。
static ParseTreeDefinition CreateFunctionCallStartTree()
{
var tree = new ParseTreeDefinition(FunctionCallStart);
var name = tree.AddOption(Literal);
name.AddOption(OpenParen, true);
return tree;
}
実際にはそれほど多くはありません。ツリーをセットアップし、最初に一致するものをリテラルとして追加し、次に一致するものを左括弧として追加し、解析ツリーを終了できると言います。もう少し複雑なバイナリ数学演算 (含める必要がある単項演算は考えられませんでした)。
static ParseTreeDefinition CreateBinaryOperationResultTree()
{
var tree = new ParseTreeDefinition(Literal);
var parenthesizedItem = tree.AddOption(ParenthesizedItem);
var literal = tree.AddOption(Literal);
var simpleCall = tree.AddOption(SimpleFunctionCall);
var complexCall = tree.AddOption(ComplexFunctionCall);
var @operator = parenthesizedItem.AddOption(Operator);
literal.LinkTo(@operator);
simpleCall.LinkTo(@operator);
complexCall.LinkTo(@operator);
@operator.AddOption(ParenthesizedItem, true);
@operator.AddOption(Literal, true);
@operator.AddOption(SimpleFunctionCall, true);
@operator.AddOption(ComplexFunctionCall, true);
return tree;
}
ここで、解析ツリーは、括弧で囲まれた項目 ( など(1/2)
)、リテラル (a5
または など3
)、単純な呼び出し ( などsin(4)
)、または複雑なもの ( など) で開始できると言います。Function1(a1;a2;a3)
)。要するに、左側のオペランドのオプションを定義しただけです。次に、括弧で囲まれた項目の後には演算子 (最初に宣言されたパターンの数学演算子の 1 つ) が続く必要があると言い、便宜上、左側のオペランドの他のすべてのオプションは同じ状態に進むことができます (オペレーターがいます)。次に、演算子には右側もある必要があるため、進行するオプションのセットを複製します。これらは左側のオペランドと同じ定義ではないことに注意してください。これらには、解析ツリーを終了できるようにフラグが設定されています。パース ツリーの名前が Literal になっていることに注意してください。これは、あらゆる場所で一致する別の種類の要素を指定する必要がないようにするためです。次に、括弧内の項目:
static ParseTreeDefinition CreateParenthesizedItemTree()
{
var tree = new ParseTreeDefinition(ParenthesizedItem);
var openParen = tree.AddOption(OpenParen);
var nestedSimpleCall = openParen.AddOption(SimpleFunctionCall);
var nestedComplexCall = openParen.AddOption(ComplexFunctionCall);
var arg = openParen.AddOption(Literal);
var parenthesizedItem = openParen.AddOption(ParenthesizedItem);
var closeParen = nestedSimpleCall.AddOption(CloseParen, true);
arg.LinkTo(closeParen);
parenthesizedItem.LinkTo(closeParen);
nestedComplexCall.LinkTo(closeParen);
return tree;
}
これは素晴らしく簡単です。括弧から始めて、ほとんど何でも続けて、別の括弧で閉じます。単純な呼び出し (のようなsin(x)
)
static ParseTreeDefinition CreateSimpleFunctionCallTree()
{
var tree = new ParseTreeDefinition(SimpleFunctionCall);
var openParen = tree.AddOption(FunctionCallStart);
var nestedItem = openParen.AddOption(ParenthesizedItem);
var nestedSimpleCall = openParen.AddOption(SimpleFunctionCall);
var nestedComplexCall = openParen.AddOption(ComplexFunctionCall);
var arg = openParen.AddOption(Literal);
var parenthesizedItem = openParen.AddOption(ParenthesizedItem);
var closeParen = nestedSimpleCall.AddOption(CloseParen, true);
arg.LinkTo(closeParen);
nestedItem.LinkTo(closeParen);
parenthesizedItem.LinkTo(closeParen);
nestedComplexCall.LinkTo(closeParen);
return tree;
}
複雑な呼び出し (などFunction1(a1;a2;a3)
)
static ParseTreeDefinition CreateComplexFunctionCallTree()
{
var tree = new ParseTreeDefinition(ComplexFunctionCall);
var openParen = tree.AddOption(FunctionCallStart);
var arg = openParen.AddOption(Literal, ParenthesisRequiredElement);
var simpleCall = openParen.AddOption(SimpleFunctionCall, ParenthesisRequiredElement);
var complexCall = openParen.AddOption(ComplexFunctionCall, ParenthesisRequiredElement);
var nested = openParen.AddOption(ParenthesizedItem);
var semi = arg.AddOption(Semi);
simpleCall.LinkTo(semi);
complexCall.LinkTo(semi);
nested.LinkTo(semi);
var arg2 = semi.AddOption(Literal, ParenthesisRequiredElement);
var simpleCall2 = semi.AddOption(SimpleFunctionCall, ParenthesisRequiredElement);
var complexCall2 = semi.AddOption(ComplexFunctionCall, ParenthesisRequiredElement);
var nested2 = semi.AddOption(ParenthesizedItem);
arg2.LinkTo(semi);
simpleCall2.LinkTo(semi);
complexCall2.LinkTo(semi);
nested2.LinkTo(semi);
var closeParen = arg2.AddOption(CloseParen, true);
arg2.LinkTo(closeParen);
simpleCall2.LinkTo(closeParen);
complexCall2.LinkTo(closeParen);
return tree;
}
必要なツリーはこれだけです。これらすべてを実行するコードを見てみましょう。
static void Main()
{
//The input string
const string input = @"Function1((a1^5-4)/2;1/sin(a2);a3;a4)+Function2(a1;a2;1/a3)";
//Locate the recognizable tokens within the source
IList<ISourcePart> tokens = Tokenize(input).Cast<ISourcePart>().ToList();
//Create the parse trees we'll need to be able to recognize the different parts of the input
var functionCallStartTree = CreateFunctionCallStartTree();
var parenthethesizedItemTree = CreateParenthesizedItemTree();
var simpleFunctionCallTree = CreateSimpleFunctionCallTree();
var complexFunctionCallTree = CreateComplexFunctionCallTree();
var binaryOpTree = CreateBinaryOperationResultTree();
//Parse until we can't parse anymore
while (functionCallStartTree.Parse(ref tokens) || binaryOpTree.Parse(ref tokens) || parenthethesizedItemTree.Parse(ref tokens) || simpleFunctionCallTree.Parse(ref tokens) || complexFunctionCallTree.Parse(ref tokens))
{ }
//Run our post processing to fix the parenthesis in the input
FixParenthesis(ref tokens);
//Collapse our parse tree(s) back to a string
var values = tokens.OrderBy(x => x.Position).SelectMany(x => x.AsTokens()).Select(x => x.Value);
//Print out our results and wait
Console.WriteLine(string.Join(string.Empty, values));
Console.ReadLine();
}
定義しなければならない唯一のことは、「複雑な」呼び出しの引数リストで要素を実際にラップする方法です。それはFixParenthesis
メソッドによって処理されます。
private static void FixParenthesis(ref IList<ISourcePart> items)
{
//Iterate through the set we're examining
for (var i = 0; i < items.Count; ++i)
{
var parseNode = items[i] as ParseNode;
//If we've got a parse node...
if (parseNode != null)
{
var nodeTree = parseNode.SourcePart as ParseTree;
//If the parse node represents a parse tree...
if (nodeTree != null)
{
//Fix parenthesis within the tree
var nodes = nodeTree.ParseNodes;
FixParenthesis(ref nodes);
nodeTree.ParseNodes = nodes;
}
//If this parse node required parenthesis, replace the subtree and add them
if (parseNode.Tag == ParenthesisRequiredElement)
{
var nodeContents = parseNode.AsTokens();
var combined = string.Join(string.Empty, nodeContents.OrderBy(x => x.Position).Select(x => x.Value));
items[i] = Token.Create(parseNode.Kind, string.Format("({0})", combined), parseNode.Position);
}
continue;
}
var parseTree = items[i] as ParseTree;
//If we've got a parse tree...
if (parseTree != null)
{
//Fix parenthesis within the tree
var nodes = parseTree.ParseNodes;
FixParenthesis(ref nodes);
parseTree.ParseNodes = nodes;
}
}
}
いずれにせよ、これが助けになったか、少なくとも楽しい気晴らしを提供したことを願っています.