ruby - open-uri とループを使用した Ruby EOFError

Question

Webクローラーを構築しようとしていて、ちょっとした障害に遭遇しました。基本的に私がやっていることは、Web ページからリンクを抽出し、各リンクをキューにプッシュすることです。Ruby インタープリターがコードのこのセクションにヒットするたびに:

links.each do |link|
  url_frontier.push(link)
end

次のエラーが表示されます。

/home/blah/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/protocol.rb:141:in `read_nonblock': end of file reached (EOFError)

上記のコードブロックをコメントアウトすると、エラーは発生しません。よろしくお願いします。残りのコードは次のとおりです。

require 'open-uri'
require 'net/http'
require 'uri'

class WebCrawler
  def self.Spider(root)
    eNDCHARS = %{.,'?!:;}
    num_documents = 0
    token_list = []
    url_repository = Hash.new
    url_frontier = Queue.new

    url_frontier.push(root.to_s)
    while !url_frontier.empty? && num_documents < 10
    url = url_frontier.pop
      if !url_repository.has_key?(url)
        document = open(url)
        html = document.read

        # extract url's
        links = URI.extract(html, ['http']).collect { |u| eNDCHARS.index(u[-1]) ? u.chop : u }

        links.each do |link|
          url_frontier.push(link)
        end

        # tokenize
        Tokenizer.tokenize(document).each do |word|
          token_list.push(IndexStructures::Term.new(word, url))
        end

        # add to the repository
        url_repository[url] = true
        num_documents += 1
      end
    end

    # sort by term (primary) and document id (secondary) in reverse to aid in the    construction of the inverted index
    return num_documents, token_list.sort_by! { |term| [term.term, term.document_id]}.reverse!
  end
end

score 0 · Accepted Answer

同じエラーが発生しましたが、ヘッドレスモードで firefox を実行している Watir-webdriver で発生しました。私が見つけたのは、2 つのアプリケーションを並行して実行していて、アプリケーションの 1 つで「ヘッドレス」を破棄すると、引用した正確なエラーで他のアプリケーションも自動的に強制終了するということでした。私の状況はあなたの状況と同じではありませんが、この問題は、アプリケーションがまだファイルハンドルを使用しているときに、ファイルハンドルを外部で早期に閉じることに関連していると思います。アプリケーションから destroy コマンドを削除すると、エラーが消えました。

お役に立てれば。

ruby - open-uri とループを使用した Ruby EOFError

1 に答える 1

Related

Reference