1

mechanizeを使用してweb_pageから画像を保存しようとしています。私はこのコードを使用します:



    @current_agent.get( image_url ).save ( save_path )

エラー(タイムアウト設定に問題があると思います):



    I, [2013-03-25T14:42:13.924694 #31865]  INFO -- : Net::HTTP::Get: /i?path=b0312211141_img_id282431557272802821.jpg
    D, [2013-03-25T14:42:13.924757 #31865] DEBUG -- : request-header: accept => */*
    D, [2013-03-25T14:42:13.924828 #31865] DEBUG -- : request-header: user-agent => Mozilla/5.0 (iPad; CPU OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3
    D, [2013-03-25T14:42:13.924858 #31865] DEBUG -- : request-header: accept-encoding => gzip,deflate,identity
    D, [2013-03-25T14:42:13.924884 #31865] DEBUG -- : request-header: accept-charset => ISO-8859-1,utf-8;q=0.7,*;q=0.7
    D, [2013-03-25T14:42:13.924915 #31865] DEBUG -- : request-header: accept-language => en-us,en;q=0.5
    D, [2013-03-25T14:42:13.924942 #31865] DEBUG -- : request-header: host => mdata.yandex.net
    I, [2013-03-25T14:42:14.151810 #31865]  INFO -- : status: Net::HTTPOK 1.0 200 OK
    D, [2013-03-25T14:42:14.151890 #31865] DEBUG -- : response-header: server => nginx/1.2.1
    D, [2013-03-25T14:42:14.151919 #31865] DEBUG -- : response-header: date => Mon, 25 Mar 2013 13:43:54 GMT
    D, [2013-03-25T14:42:14.151943 #31865] DEBUG -- : response-header: content-type => image/jpeg
    D, [2013-03-25T14:42:14.151967 #31865] DEBUG -- : response-header: content-length => 212187
    D, [2013-03-25T14:42:14.151991 #31865] DEBUG -- : response-header: last-modified => Tue, 12 Mar 2013 18:11:41 GMT
    D, [2013-03-25T14:42:14.152015 #31865] DEBUG -- : response-header: expires => Wed, 24 Apr 2013 13:43:54 GMT
    D, [2013-03-25T14:42:14.152039 #31865] DEBUG -- : response-header: cache-control => max-age=2592000
    D, [2013-03-25T14:42:14.152062 #31865] DEBUG -- : response-header: x-original-host => mdata.somesite.ru
    D, [2013-03-25T14:42:14.152086 #31865] DEBUG -- : response-header: accept-ranges => bytes
    D, [2013-03-25T14:42:14.152109 #31865] DEBUG -- : response-header: x-cache => MISS from parser.myapp.com.ua
    D, [2013-03-25T14:42:14.152133 #31865] DEBUG -- : response-header: x-cache-lookup => MISS from parser.notus.com.ua:1221
    D, [2013-03-25T14:42:14.152157 #31865] DEBUG -- : response-header: via => 1.0 parser.myapp.com.ua (squid/3.1.10)
    D, [2013-03-25T14:42:14.152180 #31865] DEBUG -- : response-header: connection => keep-alive
    D, [2013-03-25T14:42:14.152464 #31865] DEBUG -- : Read 2521 bytes (2521 total)
    D, [2013-03-25T14:42:14.152509 #31865] DEBUG -- : Read 598 bytes (3119 total)
    D, [2013-03-25T14:42:14.199787 #31865] DEBUG -- : Read 1448 bytes (6613 total)
    D, [2013-03-25T14:42:14.199887 #31865] DEBUG -- : Read 2648 bytes (9261 total)
    D, [2013-03-25T14:42:14.200125 #31865] DEBUG -- : Read 2896 bytes (12157 total)
    D, [2013-03-25T14:42:14.200286 #31865] DEBUG -- : Read 1200 bytes (13357 total)
    D, [2013-03-25T14:42:14.248204 #31865] DEBUG -- : Read 2896 bytes (16253 total)
    D, [2013-03-25T14:42:14.248436 #31865] DEBUG -- : Read 1200 bytes (17453 total)
    D, [2013-03-25T14:42:14.248510 #31865] DEBUG -- : Read 1448 bytes (18901 total)
    D, [2013-03-25T14:42:14.248609 #31865] DEBUG -- : Read 2648 bytes (21549 total)
    D, [2013-03-25T14:42:14.248864 #31865] DEBUG -- : Read 2896 bytes (24445 total)
    D, [2013-03-25T14:42:14.248985 #31865] DEBUG -- : Read 1200 bytes (25645 total)
    D, [2013-03-25T14:42:14.249174 #31865] DEBUG -- : Read 1448 bytes (27093 total)
    D, [2013-03-25T14:42:14.249354 #31865] DEBUG -- : Read 2648 bytes (29741 total)
    D, [2013-03-25T14:42:14.296443 #31865] DEBUG -- : Read 1448 bytes (31189 total)
    D, [2013-03-25T14:42:14.296583 #31865] DEBUG -- : Read 2648 bytes (33837 total)
    D, [2013-03-25T14:42:14.296756 #31865] DEBUG -- : Read 1448 bytes (35285 total)

「2896バイト(合計12157)を読み取ります」という画像のスクレイピングが開始され、その後フリーズすることがわかります。画像の解析は完了せず、画像は保存されません:(

どうすればこの問題を解決できますか?

4

1 に答える 1

1

エージェントのタイムアウト(読み取りと接続の両方)は、次のコマンドで設定できます。

@current_agent.open_timeout = 10 # in seconds
@current_agent.read_timeout = 10 # in seconds
于 2013-03-25T14:31:33.203 に答える