http - 立て続けにダウンロードするとrebolがタイムアウトする

Question

私はこれを理解しようと何時間も費やしましたが、Rebol にはそれができないようです。これは、Web ページからすべての画像をダウンロードするプログラムです。はるかに少ないコード行で記述できたのは素晴らしいことでしたが、パフォーマンスはひどいものでした。4 ～ 5 個のファイルをダウンロードした後、Rebol がタイムアウトします。ループの最後に追加することでタイムアウトが短縮されましwait 5たが、時間がかかりすぎます!

同一のプログラムが C で書かれており、すべてを瞬時にダウンロードしました。以下は、画像をダウンロードする Rebol のコードの一部です。

http://pastebin.com/fTnq8A3m

score 2 · Accepted Answer

http://pastebin.com/fTnq8A3mのスクリプトに多くのエラーがあります

たとえば、あなたが持っている

write ... read/binary ...

つまり、画像をバイナリとして読み取り、テキストとして書き出すということです。また、URL が既に URL として存在する場合、URL をテキストとして処理しています! データ・タイプ。

そうで

read/binary join http://www.rebol.com/ %image.jpg

そこの結合はデータ型を保持します! 無傷。これを行う必要はありません

read/binary to-url join "http://www.rebol.com/" %image.jpg

これらの画像のサイズは?

待機 5 を追加してもダウンロードには影響しません。ブロック同期ダウンロードを試みているためです。また、ボタンを使用しているため、VID 内にいるため、待機内で待機を使用することになります。

これを行う別の方法は、非同期ハンドラーをセットアップしてからダウンロードを開始し、現在のように GUI をブロックしないようにすることです。

score 2 · Accepted Answer

REBOL を商用アプリケーションに何年も使用してきましたが、そのほとんどはネットワーキングを必要としますが、さまざまな方法で、REBOL のネットワーキングはかなり安定していると断言できます。実際、メモリリークのない数か月のアップタイムを持つサーバーを作成できます。

しかし、あなたは非常に具体的な目標を念頭に置いているので、それをどのように実行して機能させるかを示す小さなアプリを作成すると思いました.

これは間違いなくR2で機能します。発生している可能性のある問題の 1 つは、ネットワークポートのタイムアウトですが、ダウンロードするサーバーやイメージにそれぞれ数秒かかり、デフォルトの 30 秒のタイムアウトよりも長くかかる場合にのみ発生します。

以下のアプリは、単一の URL をパラメーターとして使用し (上部付近で好きなように設定できます) 、ページで見つけたすべての<IMG> URL をダウンロードします。http と https をサポートしており、wikipedia、bing、Google 画像検索などのいくつかのサイトでテストしましたが、かなりうまく機能しています...ダウンロード速度は各サーバーでかなり一定です。ダウンロード速度を把握できるように、最小限の GUI に速度レポートを追加しました。

これは同期アプリケーションであり、画像のリストをダウンロードするだけであることに注意してください...単純にGUIを追加して同時に実行することは期待できません。これには、より複雑なネットワークを必要とする完全に異なるネットワークモデル（非同期httpポート）が必要なためです。コード。

rebol [
    title: "webpage images downloader example"
    notes: "works with R2 only"
]

; the last page-url is the one to be used... feel free to change this
page-url: http://en.wikipedia.org/wiki/Dog
page-url: https://www.google.com/search?q=dogs&tbm=isch
page-url: http://www.bing.com/images/search?q=dogs&go=&qs=ds

;------
; automatically setup URL-based information
page-dir: copy/part page-url find/last/tail page-url "/"
page-host: copy/part page-url find/tail at page-url 8 "/"

?? page-url
?? page-dir
?? page-host

output-dir: %downloaded-images/  ; save images in a subdir of current-directory
unless exists? output-dir [make-dir output-dir ]

images: []

;------
; read url (expecting an HTML document)
;
; Parse is used to collect and cleanup URLs, make them absolute URLs. 
parse/all read page-url [
    some [
        thru {<img } thru {src="} copy image to {"} (
            case [
                "https://" = copy/part image 8 [image: to-url image]
                "http://" = copy/part image 7 [image: to-url image]
                "//" = copy/part image 2 [image: join  http:// at image 3  ]
                #"/" = pick image 1 [image: join page-host image ]
                'default [image: join page-dir image]
            ]
            append images image
         )
    ]
]

;------
; pretty-print image list
new-line/all images yes
probe images

;------
; display report window
view/new layout [ field-info: text 500 para [wrap?: false]   speed-info: text 500    ]

;------
; download images and report all activity
i: bytes: 0
s: now/precise
foreach image images [
    unless attempt [
        i: i + 1 
        probe image
        legal-chars: charset [#"a" - #"z" #"A" - #"Z" "0123456789-_.="] 
        fname: to-string find/last/tail image "/" ; get filename from url

        parse/all fname [some [ legal-chars | letter: skip  (change letter "-") ] ] ; convert illegal filename chars

        fname: join output-dir to-file fname ; use url filename to build disk path
        write/binary fname read/binary image ; download file

        ; update GUI
        t: difference now/precise s

        field-info/text: rejoin ["Downloading: (" i "/" length? images ") "  fname]
        show field-info

        bytes: bytes + size? fname
        speed-info/text: rejoin ["bytes: "  bytes ",   time: "  t   ",   speed : " (bytes / 1000) / ( to-decimal t) "kb/s"]
        show speed-info

        true ; all is good, attempt should return a value
    ][
        print "^/^/---^/unable to download image:"
        print image
        print "---^/^/"
    ]
]

Web ページスキャナーを必要とせず、取得する画像の手動リストがある場合は、そのコードを次のように画像のブロックに置き換えるだけです。

images: [ 
    http://server.com/img1.png
    http://server.com/img2.png
    http://server.com/img3.png
]

ダウンロードループに任せてください。

お役に立てれば

score 1 · Accepted Answer

長い待ち時間が必要でしたか? 長いループでは、rebol は gui イベントを処理するために時々待機する必要がありますが、IIRC 待機 0 でうまくいくはずです。イベントキューイングが問題を引き起こす可能性はありますか?

http - 立て続けにダウンロードするとrebolがタイムアウトする

3 に答える 3

Related

Reference