c# - XML の定義が不十分です。すべての子ノードのノードとコンテンツをスペースで連結された文字列として取得しますか?

Question

以下に素晴らしい XML の例を示します。

<root>
    <section>Here is some text<mightbe>a tag</mightbe>might <not attribute="be" />. Things are just<label>a mess</label>but I have to parse it because that's what needs to be done and I can't <font stupid="true">control</font> the source. <p>Why are there p tags here?</p>Who knows, but there may or may not be spaces around them so that's awesome. The point here is, there's node soup inside the section node and no definition for the document.</section>
</root>

セクションノードとすべてのサブノードからテキストを文字列として取得したいと思います。ただし、サブノードの周りにスペースがある場合とない場合があることに注意してください。そのため、サブノートを埋めてスペースを追加したいと考えています。

入力がどのように見えるか、および出力をどのようにしたいかのより正確な例を次に示します。

<root>
    <sample>A good story is the<book>Hitchhikers Guide to the Galaxy</book>. It was published<date>a long time ago</date>. I usually read at<time>9pm</time>.</sample>
</root>

出力を次のようにしたい：

A good story is the Hitchhikers Guide to the Galaxy. It was published a long time ago. I usually read at 9pm.

子ノードの周りにスペースがないことに注意してください。そのため、それらをパディングする必要があります。そうしないと、単語が一緒に実行されます。

私はこのサンプルコードを使用しようとしていました:

XDocument doc = XDocument.Parse(xml);
foreach(var node in doc.Root.Elements("section"))
{
    output += String.Join(" ", node.Nodes().Select(x => x.ToString()).ToArray()) + " ";
 }

しかし、出力には子タグが含まれており、うまくいきません。

ここに何か提案はありますか？

TL;DR: ノードスープ xml が与えられ、子ノードの周囲にパディングを付けて文字列化したいと考えています。

score 1 · Accepted Answer

タグを不明なレベル (例: <date>a <i>long</i> time ago</date>) にネストしている場合は、書式設定が一貫して適用されるように再帰することもできます。例えば..

private static string Parse(XElement root)
{
    return root
        .Nodes()
        .Select(a => a.NodeType == XmlNodeType.Text ? ((XText)a).Value : Parse((XElement)a))
        .Aggregate((a, b) => String.Concat(a.Trim(), b.StartsWith(".") ? String.Empty : " ", b.Trim()));
}

score 0 · Accepted Answer

「混合コンテンツ」ノードを見ています。それらについて特に特別なことは何もありません - すべての子ノード (テキストノードもノードです) を取得し、それらの値をスペースで結合します。

何かのようなもの

var result = String.Join("", 
  root.Nodes().Select(x => x is XText ? ((XText)x).Value : ((XElement)x).Value));

c# - XML の定義が不十分です。すべての子ノードのノードとコンテンツをスペースで連結された文字列として取得しますか?

4 に答える 4

Related

Reference