ruby - ウィキのページ間のパスの検索は、より大きなパスでは機能しません

Question

入力ウィキペディアのリンクを取得し、最初のリンクをクリックするプログラムを作成しようとしています。プログラムは、2 番目の入力と一致するまで実行を続けます。最終的には、ループに達したときにプログラムを終了する機能を追加します。

現在、私のコードは、Bee -> History などのいくつかのリンクのみの例で機能していますが、パスが長いとエラーが発生します。コードは次のとおりです。昨日 Ruby を勉強し始めたばかりで、間違いがある可能性がある場合は、ご意見をお寄せいただければ幸いです。

require 'open-uri'
require 'nokogiri'

puts "Enter starting page (full URL not needed): "
page1 = gets.chomp

puts "Enter ending page (full URL not needed): "
page2 = gets.chomp

until page1 == page2 do
  #open page
  doc = Nokogiri::HTML(open("http://en.wikipedia.org/wiki/" + page1))

  %w[.//table .//span .//sup .//i].map {|n| doc.xpath(n).map(&:remove) }

  #find href in first p
  fp = doc.css("p").first.search('a').map{ |a| a['href']}

  #make page1 = the end of the url. ex. /wiki/link = link
  page1 = fp.first[6,fp.first.length]
  puts page1
end

更新：ここに私が得ているエラーがあります：

C:\Users\files>ruby 121.rb
Enter starting page (full URL not needed):
Cow
Enter ending page (full URL not needed):
Philosophy
Domestication
Latin_(language)
Classical_antiquity
History
121.rb:20:in `<main>': undefined method `length' for nil:NilClass (NoMethodError
)

score 1 · Accepted Answer

また、タスクを解決するために、ページ上のすべてのリンクを処理して page2 を実現できます。

require 'open-uri'
require 'nokogiri'

puts "Enter starting page (full URL not needed): "
start_page = gets.chomp

puts "Enter ending page (full URL not needed): "
end_page = gets.chomp

pages = [start_page]
next_page = pages.first

until next_page == end_page or pages.empty? do
  next_page = pages.pop
  puts "Treat: #{next_page}"

  doc = Nokogiri::HTML(open("http://en.wikipedia.org/wiki/" + next_page))

  %w[.//table .//span .//sup .//i].map {|n| doc.xpath(n).map(&:remove) }

 doc.css("p").each do |p| 
  p.search('a').each{ |a| pages.push a['href'][6, a['href'].length]}
 end

end

ruby - ウィキのページ間のパスの検索は、より大きなパスでは機能しません

1 に答える 1

Related

Reference