perl - 純粋な Perl で別の HTTP リソースからストリーミングする最も簡単な方法は何ですか?

Question

Perlで別のHTTPリソースからストリーミングする最も簡単な方法は何ですか? ここでは、読み取り元の HTTP リソースが潜在的に無限のストリーム (または、本当に、非常に長い) であると想定しています。

score 9 · Accepted Answer

古き良き LWP では、結果をストリームとして処理できます。

たとえば、ここに yourFunc へのコールバックがあり、yourFunc への各呼び出しに対して byte_count バイトを読み取り/渡します (各呼び出しのデータの大きさを気にせず、ストリームをできるだけ速く処理したい場合は、そのパラメーターを削除できます) ):

use LWP;
...
$browser = LWP::UserAgent->new();
$response = $browser->get($url, 
                          ':content_cb' => \&yourFunc, 
                          ':read_size_hint' => byte_count,);
...
sub yourFunc {
   my($data, $response) = @_;
   # do your magic with $data
   # $respose will be a response object created once/if get() returns
}

score 6 · Accepted Answer

HTTP::Lite's request method allows you to specify a callback.

The $data_callback parameter, if used, is a way to filter the data as it is received or to handle large transfers. It must be a function reference, and will be passed: a reference to the instance of the http request making the callback, a reference to the current block of data about to be added to the body, and the $cbargs parameter (which may be anything). It must return either a reference to the data to add to the body of the document, or undef.

~~However, looking at the source, there seems to be a bug in sub request in that it seems to ignore the passed callback.~~ It seems safer to use set_callback:

#!/usr/bin/perl

use strict;
use warnings;

use HTTP::Lite;

my $http = HTTP::Lite->new;
$http->set_callback(\&process_http_stream);
$http->http11_mode(1);

$http->request('http://www.example.com/');

sub process_http_stream {
    my ($self, $phase, $dataref, $cbargs) = @_;
    warn $phase, "\n";
    return;
}

Output:

C:\Temp> ht
connect
content-length
done-headers
content
content-done
data
done

It looks like a callback passed to the request method is treated differently:

#!/usr/bin/perl

use strict;
use warnings;

use HTTP::Lite;

my $http = HTTP::Lite->new;
$http->http11_mode(1);

my $count = 0;
$http->request('http://www.example.com/',
    \&process_http_stream,
    \$count,
);

sub process_http_stream {
    my ($self, $data, $times) = @_;
    ++$$times;
    print "$$times====\n$$data\n===\n";
}

score 3 · Accepted Answer

ちょっと待って、わからない。別のプロセスを除外するのはなぜですか？これ：

open my $stream, "-|", "curl $url" or die;
while(<$stream>) { ... }

確かに私には「最も簡単な方法」のように見えます。ここでの他の提案よりも確かに簡単です...

score 2 · Accepted Answer

Event::Libは、プラットフォームの最速の非同期 IO メソッドへの簡単なインターフェイスを提供します。

IO::Lambdaは、高速で応答性の高い IO アプリケーションの作成にも非常に適しています。

perl - 純粋な Perl で別の HTTP リソースからストリーミングする最も簡単な方法は何ですか?

5 に答える 5

Related

Reference