c# - C#で文字列の複数のセクションを解析するきれいな方法は何ですか?

Question

XML と通常の文字列を含む文字列があります。<math....</math>文字列内のすべてのインスタンスを解析する必要があります。<math>この文字列からこれの複数のセクション (からまで)を解析するにはどうすればよい</math>ですか?

Here is some content <math
xmlns="http://www.w3.org/1998/Math/MathML">  
<mi>a</mi><mo>&#x2260;</mo><mn>0</mn> </math>, that is mixed in with
this other content <math xmlns="http://www.w3.org/1998/Math/MathML">  
<mi>a</mi><msup><mi>x</mi><mn>2</mn></msup>   <mo>+</mo>
<mi>b</mi><mi>x</mi>   <mo>+</mo> <mi>c</mi> <mo>=</mo> <mn>0</mn>
</math> we want to be able to seperate this string

背景: この質問を一般的なものにしようとしました。私がやろうとしていることの詳細は、MVC3 エンコーディングと Raw の比較です。デフォルトですべてをエンコードします。MathML をエンコードしたくはありませんが、他のすべてをエンコードしたいと考えています。そのため、その一部を Html.Raw (MathML 部分) としてレンダリングし、残りを通常のエンコードされた文字列としてレンダリングしたいと考えています。

score 1 · Accepted Answer

一般に、XML が適切にフォーマットされているか、少なくとも一貫してフォーマットされていると予想できる場合は、正規表現を使用して XML を削除できるはずです。

Expressoを試して表現を作成できます。

取り除いた XML を解析したい場合、それは .NET XMLParser の仕事です。

score 0 · Accepted Answer

私は正規表現のボフィンではありませんが、これを試したところ、正しい結果が返されました。ベースとして使用し、必要に応じて変更してください。

Stackoverflowのこの投稿から入手しました。

string yourstring = "<math xmlns=\"http://www.w3.org/1998/Math/MathML\">   <mi>a</mi><mo>&#x2260;</mo><mn>0</mn> </math>, that is mixed in with this other content <math xmlns=\"http://www.w3.org/1998/Math/MathML\">   <mi>a</mi><msup><mi>x</mi><mn>2</mn></msup>   <mo>+</mo> <mi>b</mi><mi>x</mi>   <mo>+</mo> <mi>c</mi> <mo>=</mo> <mn>0</mn> </math>";

try
{
     yourstring = Regex.Replace(yourstring, "(<math[^>]+>.+?</math>)", "");
}
catch (ArgumentException ex)
{
     // Syntax error in the regular expression
}

結果の文字列は次のとおりです。

, that is mixed in with this other content

c# - C#で文字列の複数のセクションを解析するきれいな方法は何ですか?

2 に答える 2

Related

Reference