html - 定義された開始点から定義された終点まで HTML を解析しますか?

Question

私はいくつかのHTMLを持っています:

<hr noshade>
<p><a href="#1">Some text here</a></p>
<p style="margin-top:0pt;margin-bottom:0pt;line-height:120%;"><span style="color:#000000;font-weight:bold;">This is some description</span></p>
<hr noshade> <!-- so <hr noshade> is the delimiter for me -->
<p><a href="#2">Some more text here</a></p>
<p style="margin-top:0pt;margin-bottom:0pt;line-height:120%;"><span style="color:#000000;font-weight:bold;">This is description for some more text</span></p>
<hr noshade>

nokogiri を使用して解析しているときに、独自の delimiter で区切られたこれらのタグの各セットの間に情報を出力したいと考えています<hr noshade>。したがって、最初のブロックは、2 つのタグの間にあるすべての「p」タグ間の情報を出力する必要がありますhr noshade。

score 1 · Accepted Answer

XPathで受け入れられた回答を使用しています 2つの特定の要素の間のすべての要素を選択します

私は半満足な解決策しか持っていません

次の XPath 式を使用できます。

.//hr[1][@noshade]
  /following-sibling::*[not(self::hr[@noshade])]
                       [count(preceding-sibling::hr[@noshade])=1]

<hr noshade>1 と 2の間の最初のグループでは、

それから、

.//hr[2][@noshade]
  /following-sibling::*[not(self::hr[@noshade])]
                       [count(preceding-sibling::hr[@noshade])=2]

<hr noshade>2 と 3の間の要素など。

これらの式が選択するもの:

<hr noshade>位置 N で指定されたのすべての兄弟
前の兄弟が N 人しかいない<hr noshade>、つまり N 番目のグループに位置する
そしてそれは<hr noshade>自分自身ではない

2 の間でいくつかの要素を選択する<hr noshade>ため、結果をループして、兄弟要素ごとにデータを抽出する必要がある場合があります。

より一般的な解決策を知っている人はいますか?

html - 定義された開始点から定義された終点まで HTML を解析しますか?

1 に答える 1

Related

Reference