ruby - ネットhttpでhtmlページを解析する

Question

前の質問で、ハッキングされたが機能しているページのタイトルを解析する方法の答えを見つけました

 url = %x(curl http://google.com)
 simian = curl.match(/<title>(.*)<\/title>/)[1]
 puts simian

ここで、(curl の代わりに) net/http のような Ruby 標準ライブラリを使用して URL をフェッチするより良い方法があるかどうかを知りたいと思いました。

もう 1 つの問題は、ページのタイトルに非標準文字が含まれている場合、それを解析せず、curl.match を完了できないことです。私が試してみました

 simian = s.encode('UTF-8') and then
 simian = curl.match(/<title>(.*)<\/title>/)[1]

しかし、1# のような変な文字が表示されます。

score 1 · Accepted Answer

Using nokogiri is probably the simplest solution:

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open('http://www.google.com'))
elt = doc.xpath('//title').first
puts elt.text() if !elt.nil?

ruby - ネットhttpでhtmlページを解析する

1 に答える 1

Related

Reference