ruby - リンクのテキストからノコギリとのリンクを抽出しますか?

Question

Nokogiri を使用して、テキストで検索して Web ページから特定のリンクを抽出したい:

<div class="links">
   <a href='http://example.org/site/1/'>site 1</a>
   <a href='http://example.org/site/2/'>site 2</a>
   <a href='http://example.org/site/3/'>site 3</a>
</div>

「サイト 3」の href が必要で、以下を返します。

http://example.org/site/3/

または、「サイト 1」の href が必要で、次のように返します。

http://example.org/site/1/

どうすればいいですか？

score 3 · Accepted Answer

オリジナル：

text = <<TEXT
<div class="links">
  <a href='http://example.org/site/1/'>site 1</a>
  <a href='http://example.org/site/2/'>site 2</a>
  <a href='http://example.org/site/3/'>site 3</a>
</div>
TEXT

link_text = "site 1"

doc = Nokogiri::HTML(text)
p doc.xpath("//a[text()='#{link_text}']/@href").to_s

更新しました：

私の知る限り、NokogiriのXPath実装は正規表現をサポートしていません。基本的なstarts withマッチングには、次のstarts-withように使用できるという関数があります（「s」で始まるリンク）。

doc = Nokogiri::HTML(text)
array_of_hrefs = doc.xpath("//a[starts-with(text(), 's')]/@href").map(&:to_s)

score 3 · Accepted Answer

多分あなたはCSSスタイルの選択が好きになるでしょう:

doc.at('a[text()="site 1"]')[:href] # exact match
doc.at('a[text()^="site 1"]')[:href] # starts with
doc.at('a[text()*="site 1"]')[:href] # match anywhere

score 1 · Accepted Answer

require 'nokogiri'

text = "site 1"

doc = Nokogiri::HTML(DATA)
p doc.xpath("//div[@class='links']//a[contains(text(), '#{text}')]/@href").to_s

score 1 · Accepted Answer

URI モジュールを使用して、Ruby でこれを行う別の方法を文書化するだけです。

require 'uri'

html = %q[
<div class="links">
    <a href='http://example.org/site/1/'>site 1</a>
    <a href='http://example.org/site/2/'>site 2</a>
    <a href='http://example.org/site/3/'>site 3</a>
</div>
]

uris = Hash[URI.extract(html).map.with_index{ |u, i| [1 + i, u] }]

=> {
    1 => "http://example.org/site/1/'",
    2 => "http://example.org/site/2/'",
    3 => "http://example.org/site/3/'"
}

uris[1]
=> "http://example.org/site/1/'"

uris[3]
=> "http://example.org/site/3/'"

内部では、URI.extractは正規表現を使用します。これは、ページ内のリンクを見つける最も堅牢な方法ではありませんが、URI は通常、有用な場合は空白のない文字列であるため、非常に優れています。

ruby - リンクのテキストからノコギリとのリンクを抽出しますか?

4 に答える 4

Related

Reference