1

Python mechanize を使用してループ内で複数のファイルをダウンロードする際に問題が発生しています。Beautiful Soup 4も利用しています。どちらのパッケージのドキュメントにも答えがないようです。

これが私のコードです - 実際のループまでスキップしてください。参考のためにすべてを含めました:

import mechanize, cookielib, os, time
from bs4 import BeautifulSoup


fcList = ['abandoned mine land inventory points', 'abandoned mine land inventory polygons', \
          'abandoned mine land inventory sites', 'coal mining operations', 'coal pillar location-mining', \
          'industrial mineral mining operations', 'longwall mining panels', 'mine drainage treatment/land recycling project locations', \
          'mined out areas', 'residual waste operations', 'underground mining permit']

dlLink = 'FTP Download'
dloadPath = 'C:\\Users\\SomeGuy\\Downloads'

# Browser
br = mechanize.Browser()

# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Select the first (index zero) form
br.select_form(nr=0)

# Input form data
br.form['Keyword']='mining'
br.submit()
html = br.response().read()

# Pass html to beautiful soup for parse
soup = BeautifulSoup(html)
htmlinks = soup.findAll("a")

# Find links with desired text
for htmlink in htmlinks:
    string = str(htmlink.string)
    if string.lower() in fcList:
        print "Matched link!", string + ". attempting download...\n"
        try:
            req = br.click_link(text = string)
            br.open(req)
            print "URL: " + str(br.geturl)
            html = br.response().read()
            soup = BeautifulSoup(html)
            the_tag = soup.find('a', text=dlLink)
            fileURL = the_tag.get('href')
            print fileURL
            # attempt download
            fnam = string.replace(" ", "_")
            fnam = fnam.replace("/", "_")
            f = br.retrieve(fileURL, os.path.join(dloadPath, fnam + ".zip"))
            print f + "\n"
            br.back()
        except:
            print "An unknown error occurred."

出力:

>>> 
Matched link! Abandoned Mine Land Inventory Points. attempting download...

URL: <bound method Browser.geturl of <mechanize._mechanize.Browser instance at 0x02D9D7B0>>
http://www.pasda.psu.edu/data/dep/AMLInventoryPoints2013_04.zip
An unknown error occurred.
Matched link! Abandoned Mine Land Inventory Polygons. attempting download...

An unknown error occurred.
Matched link! Abandoned Mine Land Inventory Sites. attempting download...

An unknown error occurred.
Matched link! Coal Mining Operations. attempting download...

An unknown error occurred.
Matched link! Coal Pillar Location-Mining. attempting download...

An unknown error occurred.
Matched link! Industrial Mineral Mining Operations. attempting download...

An unknown error occurred.
Matched link! Longwall Mining Panels. attempting download...

An unknown error occurred.
Matched link! Mine Drainage Treatment/Land Recycling Project Locations. attempting     download...

An unknown error occurred.
Matched link! Mined Out Areas. attempting download...

An unknown error occurred.
Matched link! Residual Waste Operations. attempting download...

An unknown error occurred.
Matched link! Underground Mining Permit. attempting download...

An unknown error occurred.
>>> 

この問題は、ダウンロード間の待ち時間がないことが原因である可能性があると思います。このコードは、選択したファイルに関係なく、ループ内の最初のファイルを正常にダウンロードします。あるいは、私が気付いていない他のバグかもしれません - 昨日 mechanize と beautifulsoup をダウンロードしたばかりです!

4

1 に答える 1

0

これを試して:

f = br.retrieve(fileURL, os.path.join(dloadPath, fnam + ".zip"))[0]  

これが機能しない場合try..catchは、実際に発生しているエラーを削除して投稿してください

于 2013-05-18T15:03:12.907 に答える