python - Urllib downloads file as requested everywhere except on my laptop

Question

I have a Python script that downloads a GRIB file (weather forecast data) from the NOAA website based on a date, time, and hours to forecast ahead. Basically the Python pieces together a big URL request and posts it over to the NOAA website. This works great on the computers at school and it worked great for some previous stack-overflowers that assisted me with the code. However, the same exact script fails 9 out of 10 times using Python on my computer, even though when I make Python print out the URL and I copy it into Firefox, it works fine every time. Changing the library to urllib2 doesn't change anything.

So I can say the following: somehow urllib is not able to get the data I want if I am using my computer but the script works fine everywhere else. Urllib can scrape HTML off of other websites on my computer with no problem but somehow this particular download is giving it trouble.

I am running Ubuntu precise and using Python 2.7.3 on a laptop with a wireless connection when I try to run the script at home. I have tested it on an a wired computer with ubuntu precise and it works every time (also tested on fedora, also works there).

Please tell me some diagnostics I can do to figure out why urllib and my computer aren't playing nice. And thank you; this problem is standing in the way of the next generation of high altitude balloon launches.

Heres what it tells me 90% of the time:

Traceback (most recent call last):
File "/home/dantayaga/bovine_aerospace/dev/grib_get.py", line 67, in <module>
webf=urllib.urlopen(griburl, data='POST')
File "/usr/lib/python2.7/urllib.py", line 88, in urlopen
return opener.open(url, data)
File "/usr/lib/python2.7/urllib.py", line 209, in open
return getattr(self, name)(url, data)
File "/usr/lib/python2.7/urllib.py", line 344, in open_http
h.endheaders(data)
File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 776, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 757, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno -2] Name or service not known

Here is the code I am using (credit to samy.vilar et al for improved pythonicity). Note that you have to input today's date and a forecast time of 00, 06, 12 or 18 (GMT) otherwise you may get a 404 not found. Keep forecast hours the same.

Get GRIB files

import urllib
#import os

#os.environ['http_proxy']='' #Doesn't seem to help!

forecast_time='06' #What time the forecast is (00, 06, 12, 18)
forecast_hours='12' #How many hours ahead to forecast (2 or 3 digits)
forecast_date='20120720' #What date the forecast is for yyyymmdd

top_lat=90 #Top of bounding box (North)
bottom_lat=-90 #Bottom of bounding box (South)
left_lon=-90 #Left of bounding box (West)
right_lon=90 #Right of bounding box (East)

griburl='http://nomads.ncep.noaa.gov/cgi-bin/filter_gfs_hd.pl?'
griburl=griburl+'file=gfs.t'+str(forecast_time)+'z.mastergrb2f'
griburl=griburl+forecast_hours

#Select atmospheric levels

griburl=griburl+'&lev_1000_mb=on'  #1000 mb level
griburl=griburl+'&lev_975_mb=on'   #975 mb level
griburl=griburl+'&lev_950_mb=on'   #950 mb level
griburl=griburl+'&lev_925_mb=on'   #925 mb level
griburl=griburl+'&lev_900_mb=on'   #900 mb level
griburl=griburl+'&lev_850_mb=on'   #850 mb level
griburl=griburl+'&lev_800_mb=on'   #800 mb level
griburl=griburl+'&lev_750_mb=on'   #750 mb level
griburl=griburl+'&lev_700_mb=on'   #700 mb level
griburl=griburl+'&lev_650_mb=on'   #650 mb level
griburl=griburl+'&lev_600_mb=on'   #600 mb level
griburl=griburl+'&lev_550_mb=on'   #550 mb level
griburl=griburl+'&lev_500_mb=on'   #500 mb level
griburl=griburl+'&lev_450_mb=on'   #450 mb level
griburl=griburl+'&lev_400_mb=on'   #400 mb level
griburl=griburl+'&lev_350_mb=on'   #350 mb level
griburl=griburl+'&lev_300_mb=on'   #300 mb level
griburl=griburl+'&lev_250_mb=on'   #250 mb level
griburl=griburl+'&lev_200_mb=on'   #200 mb level
griburl=griburl+'&lev_150_mb=on'   #150 mb level
griburl=griburl+'&lev_100_mb=on'   #100 mb level
griburl=griburl+'&lev_70_mb=on'    #70 mb level
griburl=griburl+'&lev_30_mb=on'    #30 mb level
griburl=griburl+'&lev_20_mb=on'    #20 mb level
griburl=griburl+'&lev_10_mb=on'    #10 mb level

#Select variables

griburl=griburl+'&var_HGT=on'  #Height (geopotential m)
griburl=griburl+'&var_RH=on'  #Relative humidity (%)
griburl=griburl+'&var_TMP=on' #Temperature (K)
griburl=griburl+'&var_UGRD=on' #East-West component of wind (m/s)
griburl=griburl+'&var_VGRD=on' #North-South component of wind (m/s)
griburl=griburl+'&var_VVEL=on' #Vertical Windspeed (Pa/s)

#Select bounding box

griburl=griburl+'leftlon='+str(left_lon)
griburl=griburl+'rightlon='+str(right_lon)
griburl=griburl+'toplat='+str(top_lat)
griburl=griburl+'bottomlat='+str(bottom_lat)

#Select date and time

griburl=griburl+'&dir=%2Fgfs.'+forecast_date+forecast_time+'%2Fmaster'
print(griburl)
print('Downloading GRIB file for date '+forecast_date+' time ' +forecast_time + ', forecasting '+forecast_hours+' hours ahead...')
webf=urllib.urlopen(griburl, data='POST')
print("Download complete.  Saving...")
local_filename=forecast_date+'_'+forecast_time+'_'+forecast_hours+'.grib'
localf=open(local_filename, 'wb')
localf.write(webf.read())
print('Requested grib data written to file '+local_filename)

score 1 · Accepted Answer

IOError: [Errno socket error] [Errno -2] Name or service not known

この例外は、ラップトップがホスト名を IP アドレスに解決できないことを示しています。DNS ルックアップはソケットライブラリによって処理されます。これは、urllibor urllib2(またはそれ以外のもの) を使用するかどうかとは関係ありません。

ネットワークの設定、特に DNS サーバーを確認する必要があります。Firefox がプロキシを使用するように構成されている可能性があります。この場合、DNS ルックアップをプロキシに委任しています。

他のサイトで問題がないのは奇妙です。他のサイトでHTML スクレイピングが機能する理由を説明することはできませんがurllib(これらのスクリプトではプロキシが有効になっているのではないでしょうか?)、発生している例外は間違いなく DNS に関連しています。

Firefox がプロキシを使用していることがわかった場合は、同じプロキシを使用するようにスクリプトを設定してみてください。簡単な方法は、次のように Python スクリプトを呼び出すことです。

http_proxy=http://proxy:1234 python grib_get.py

または、診断目的で、リモートサーバーの IP アドレスを URL に一時的にハードコードすることもできます。

griburl='http://140.90.33.62/cgi-bin/filter_gfs_hd.pl?'

score 0 · Accepted Answer

メモリの問題で、メモリが不足しているため、ディスクにページアウトされて速度が低下しているのだろうかと思います。

とにかく、実際に何かをダウンロードする前に、「ダウンロードが完了しました。保存しています...」と印刷しています。

代わりにこれを試してください：

print('Downloading GRIB file for date '+forecast_date+' time ' +forecast_time + ', forecasting '+forecast_hours+' hours ahead...')
local_filename=forecast_date+'_'+forecast_time+'_'+forecast_hours+'.grib'
webf=urllib.urlopen(griburl, data='POST')
localf=open(local_filename, 'wb')
BLOCK_SIZE = 4096
while True:
    block = webf.read(BLOCK_SIZE)
    if not block:
        break
    localf.write(block)
localf.close()
webf.close()
print("Download complete.  Saving...")

python - Urllib downloads file as requested everywhere except on my laptop

Get GRIB files

2 に答える 2

Related

Reference