c# - RegexまたはXmlParserのいずれかを使用して、タグに含まれていないテキストを置き換えます

Question

HTML / XMLを解析または操作するために正規表現を使用することは悪い考えであり、私は通常それを行うことは決してないことを知っています。しかし、選択肢がないためにそれを考慮します。

C＃を使用して、まだタグの一部ではない文字列（理想的には特定のIDを持つスパンタグ）内のテキストを置き換える必要があります。

たとえば、スパン内にない次のテキストのABCのすべてのインスタンスを代替テキスト（私の場合は別のスパン）に置き換えたいとします。

ABC at start of line or ABC here must be replaced but, ABC inside span must not be replaced with anything. Another ABC here this ABC must also be replaced

先読みと後ろ向きの両方のアサーションで正規表現を使用してみました。の線に沿ったさまざまな組み合わせ

string regexPattern = "(?<!id=\"__publishingReusableFragment\").*?" + stringToMatch + ".*?(?!span)";

しかし、それをあきらめました。

XElementにロードして、そこからライターを作成し、ノード内にないテキストを取得しようとしました。しかし、それも理解できませんでした。

XElement xel = XElement.Parse("<payload>" + inputString + @"</payload>");
XmlWriter requiredWriter = xel.CreateWriter();

どういうわけか、ライターを使用してノードの一部ではない文字列を取得し、それらを置き換えることを望んでいます。

基本的に、私はこの問題を解決するための提案/解決策を受け入れています。

助けてくれてありがとう。

score 2 · Accepted Answer

私はその少し醜いことを知っていますが、これはうまくいくでしょう

var s =
    @"ABC at start of line or ABC here must be replaced but, <span id=""__publishingReusableFragment"" >ABC inside span must not be replaced with anything. Another ABC here </span> this ABC must also be replaced";
var newS = string.Join("</span>",s.Split(new[] {"</span>"}, StringSplitOptions.None)
    .Select(t =>
        {
            var bits = t.Split(new[] {"<span"}, StringSplitOptions.None);
            bits[0] = bits[0].Replace("ABC","DEF");
            return string.Join("<span", bits);
        }));

score 2 · Accepted Answer

resultString = Regex.Replace(subjectString, 
    @"(?<!              # assert that we can't match the following 
                        # before the current position: 
                        # An opening span tag with specified id
     <\s*span\s*id=""__publishingReusableFragment""\s*>
     (?:                # if it is not followed by...
      (?!<\s*/\s*span)  # a closing span tag
      .                 # at any position between the opening tag
     )*                 # and our text
    )                   # End of lookbehind assertion
    ABC                 # Match ABC", 
    "XYZ", RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace);

HTML解析に関するすべての警告（ご存知のようですので、ここでは繰り返しません）は引き続き有効です。

ABC正規表現の前に開始タグがなく、2つの間に終了タグがない場合、正規表現は一致します。ネストされたタグが存在する可能性がある場合は、明らかに失敗します。

c# - RegexまたはXmlParserのいずれかを使用して、タグに含まれていないテキストを置き換えます

2 に答える 2

Related

Reference