c# - [^\000]*? を使用して 0 の場合、正規表現は一致しません。

Question

良い一日、

正規表現を使用してタグ内のすべてを取得する代替手段はありますか? ここに私のコードがあります:

   MatchCollection matches = Regex.Matches(chek, "<bib-parsed>([^\000]*?)</bib-parsed>");

サンプル入力は次のとおりです。

   <bib-parsed>
   <cite>
   <pubinfo>
   <pub-year><i>1984</i></pub-year>
   <pub-place>Albuquerque</pub-place>
   <pub-name>Maxwell Museum of Anthropology and the University of New Mexico Press        </pub-name>
   </pubinfo>
   <bkinfo>
   <btl>The Galaz Ruin: A Prehistoric Mimbres Village in Southwestern New Mexico</btl>
   </bkinfo>
   </bib-parsed>

上記のサンプルは一致しますが、「2001」のように pubyear 内に「0」がある場合、一致は失敗します。これに代わるものはありますか?ありがとう

score 6 · Accepted Answer

It appears your input is valid XML. If this is the case, use the XML parsers in either System.Xml or System.Xml.Linq. They are extremely fast. For an input string containing multiple chunks like your example, using the System.Xml.Linq namespace objects:

var bibChunks = XDocument.Parse(yourXmlString)
                         .Descendants("bib-parsed")
                         .Select(e => e.Value);

foreach(string chunk in bibChunks) {
    // do stuff
}

That's all there is to it.

c# - [^\000]*? を使用して 0 の場合、正規表現は一致しません。

1 に答える 1

Related

Reference