ruby - タグの一意のセット間でテキストと html を選択するための Nokogiri

Question

Nokogiri を使用して、2 つの一意のタグセットの間のテキストを抽出しようとしています。

との間の p-tag 内のテキストを取得し、との間のすべての HTML を取得する最良の方法<h2 class="point">The problem</h2>は何<h2 class="point">The solution</h2>ですか?<h2 class="point">The solution</h2><div class="frame box sketh">

完全な html のサンプル:

<h2 class="point">The problem</h2>
<p>TEXT I WANT </p>
<h2 class="point">The solution</h2>
HTML I WANT with it's own set of tags (but never an <h2> or <div>)
<div class="frame box sketh"><img src="URL for Image I want later" alt="" /></div>

ありがとうございました！

score 2 · Accepted Answer

require 'nokogiri'

doc = Nokogiri.HTML(DATA)
doc.search('//h2/following-sibling::node()[name() != "h2" and name() != "div" and text() != "\n"]').each do |block|
  p block.text
end

__END__
<h2 class="point">The problem</h2>
<p>TEXT I WANT</p>
<h2 class="point">The solution</h2>
<div>dont capture this</div>
<span>HTML I WANT with it's <p>own set <b>of</b> tags</p></span>
<div class="frame box sketh"><img src="URL for Image I want later" alt="" /></div>

出力：

"TEXT I WANT"
"HTML I WANT with it's own set of tags"

h2この XPath は、ではないh2、divまたは stringのみを含むすべての後続の兄弟ノードを選択します"\n"。

score 1 · Accepted Answer

クラスポイントを含む2つのh2の間のpタグテキストを取得する方法は次のとおりです

//h2[@class="point"][1]/following-sibling::p[./following-sibling::h2[@class="point"]]/text()

2つ目は w3schools を探索し、最初のものを例にしてそれを実行する必要があります。

ruby - タグの一意のセット間でテキストと html を選択するための Nokogiri

2 に答える 2

Related

Reference