1

*Edit: Per my comment below, I guess a better question would be, 'What would the proper way be to have mechanize go through each url and update its name column? (each name would be unique to the url)' Below is what I've been basing my exercise on. *


I have a postgres table that goes like... | name (string) | url (text) |

The url column is already populated with various url's and appears like this one: http://www.a4apps.com/Websites/SampleCalendar/tabid/89/ctl/Register/Default.aspx

I am trying to run a mechanize rake task that will run through each url and update the name based on the text it finds at a css tag.

namespace :db do
  desc "Fetch css from db urls"
  task :fetch_css => :environment do

    require 'rubygems'
    require 'mechanize'
    require 'open-uri'

    agent = Mechanize.new
    url = Mytable.pluck(:url)
    agent.get(url)
    agent.page.search('#dnn_ctr444_ContentPane').each do |item|
      name = item.css('.EventNextPrev:nth-child(1) a').text
      Mytable.update(:name => name)
    end 
  end
end

When I run the rake task it returns:

rake aborted!
bad URI(is not URI?): %255B%2522http://www.a4apps.com/Websites/SampleCalendar/tabid/89/Default.aspx%2522,%2520%2522http://www.a4apps.com/Websites/SampleCalendar/tabid/89/ctl/Privacy/Default.aspx%2522,%2520%2522http://www.a4apps.com/Websites/SampleCalendar/tabid/89/ctl/Terms/Default.aspx%2522,%2520%2522http://www.a4apps.com/Websites/SampleCalendar/tabid/89/ctl/Register/Default.aspx%2522%255D

Thanks for any help. If there's any way I can make the question easier to answer, please let me know. Mike

4

1 に答える 1

1

最近、自分の質問に答えるのが少し寂しいですが、他の誰かが同じバインドに陥った場合に備えて、回答を投稿します. また、私のソリューションにまだ見られていない致命的な欠陥があるかどうかを他の人が教えてくれるかもしれません。これは、テーブルから URL を取得し、それらに対して mechanize を実行し、URL で見つかった情報でテーブルを更新する、機能しているように見える最終的なレーキです...

namespace :db do
  desc "Fetch css from db urls"
  task :fetch_css => :environment do

    Mytable.all.each do |info|  # for each row do...
      require 'rubygems'
      require 'mechanize'
      require 'open-uri'
      agent = Mechanize.new
      agent.get(info.url)             # get the url column data for the current db row...
      nombre = agent.page.search('.EventNextPrev:nth-child(1) a').text  # plug it into mech.
      info.update_attributes(:name => nombre)  # and update the db with the css result.
    end

  end
end

ありがとう。マイク

于 2012-11-09T14:01:59.177 に答える