html - Railsユーザーが入力したURLからリンクタグを解析するにはどうすればよいですか

Question

ユーザーが入力したURLのページに次のようなものが含まれているかどうかを確認できるようにしたいと思います。

<link rel="alternate" type="application/rss+xml" ... href="http://feeds.example.com/MyBlog"/>

そうすれば、アトムまたはRSSフィードのURLを解析する1つのオプションを排除できます。

これを行う良い方法はありますか？サーバーにユーザーのURLのhtml全体を解析させ、すべてをいじくり回す必要がありますか？

解析後に使用する変数のURLが必要になります

score 2 · Accepted Answer

Nokogiri gem を使用できます - http://www.nokogiri.org/

css スタイルのドキュメント検索構文を使用した例を次に示します。

require 'nokogiri'
require 'open-uri'

document = Nokogiri::HTML(open('http://www.example.com/'))
rss_xml_nodes = doc.css('link[rel="alternate"][type="application/rss+xml"]')
rss_xml_hrefs = rss_xml_nodes.collect { |node| node[:href] }

rss_xml_nodes には Nokogiri XML 要素の配列が含まれます

rss_xml_hrefs には、ノードの href 属性を含む文字列の配列が含まれます

rss_xml_nodes.empty?
=> false

rss_xml_hrefs
=> ["http://www.example.com/rss-feed.xml", "http://www.example.com/rss-feed2.xml"]

score 0 · Accepted Answer

単一の http リクエストですべてを取得する以外に、すべてを取得する方法がないため、実際にはすべてを解析する必要があると思います。これには、Ruby の Net:HTTP クラスを次のように使用できます。

require 'net/http'

url = URI.parse('http://www.example.com/index.html')
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http|
  http.request(req)
}

# regex below grabs all the hrefs on link tags
# print all the matches
res.body.scan(/<link[^>]*href\s*=\s*["']([^"']*)/).each {|match| 
  puts match
}

html - Railsユーザーが入力したURLからリンクタグを解析するにはどうすればよいですか

2 に答える 2

Related

Reference