I have a script that visits fcc.gov, then clicks a link which triggers a download:
require "mechanize"
docket_number = "12-268" #"96-128"
url = "http://apps.fcc.gov/ecfs/comment_search/execute?proceeding=#{docket_number}"
agent = Mechanize.new
agent.pluggable_parser.default = Mechanize::DirectorySaver.save_to 'downloads'
agent.get(url) do |page|
link = page.link_with(:text => "Export to Excel file")
xls = agent.click(link)
end
This works fine when docket_number
is "12-268". But when you change it to "96-128", Mechanize downloads the html of the page instead of the desired spreadsheet.
The urls for both pages are:
- http://apps.fcc.gov/ecfs/comment_search/execute?proceeding=12-268 (works)
- http://apps.fcc.gov/ecfs/comment_search/execute?proceeding=96-128 (this is where I need help)
As you can see, if you visit each page in a browser (I'm using Chrome) and click "Export to Excel file", a spreadsheet file is downloaded and there is not problem. "96-128" has many more rows, so when you click on the Export link, it takes you to a new page that refreshes every 10 seconds or so until the file begins downloading. How can I get around this and why is there this inconsistency?