10

I'm writing a small node.js application that receives a multipart POST from an HTML form and pipes the incoming data to Amazon S3. The formidable module provides the multipart parsing, exposing each part as a node Stream. The knox module handles the PUT to s3.

var form = new formidable.IncomingForm()
 ,  s3   = knox.createClient(conf);

form.onPart = function(part) {
    var put = s3.putStream(part, filename, headers, handleResponse);
    put.on('progress', handleProgress);
};

form.parse(req);

I'm reporting the upload progress to the browser client via socket.io, but am having difficulty getting these numbers to reflect the real progress of the node to s3 upload.

When the browser to node upload happens near instantaneously, as it does when the node process is running on the local network, the progress indicator reaches 100% immediately. If the file is large, i.e. 300MB, the progress indicator rises slowly, but still faster than our upstream bandwidth would allow. After hitting 100% progress, the client then hangs, presumably waiting for the s3 upload to finish.

I know putStream uses Node's stream.pipe method internally, but I don't understand the detail of how this really works. My assumption is that node gobbles up the incoming data as fast as it can, throwing it into memory. If the write stream can take the data fast enough, little data is kept in memory at once, since it can be written and discarded. If the write stream is slow though, as it is here, we presumably have to keep all that incoming data in memory until it can be written. Since we're listening for data events on the read stream in order to emit progress, we end up reporting the upload as going faster than it really is.

Is my understanding of this problem anywhere close to the mark? How might I go about fixing it? Do I need to get down and dirty with write, drain and pause?

4

1 に答える 1

8

あなたの問題は、マルチパートフォームパーサーからの出力の非常に単純な読み取りストリームである にstream.pause実装されていないことです。part

Knox は、パーツが "data" を発行するたびに "progress" イベントを発行するように s3 リクエストに指示します。ただし、partストリームは一時停止を無視するため、進行状況イベントはフォーム データがアップロードされて解析されるのと同じ速さで発行されます。

しかし、手ごわいは、と(解析中のリクエストへの呼び出しをプロキシします) のform両方を行う方法を知っています。pauseresume

このような何かがあなたの問題を解決するはずです:

form.onPart = function(part) {

    // once pause is implemented, the part will be able to throttle the speed
    // of the incoming request
    part.pause = function() {
      form.pause();
    };

    // resume is the counterpart to pause, and will fire after the `put` emits
    // "drain", letting us know that it's ok to start emitting "data" again
    part.resume = function() {
      form.resume();
    };

    var put = s3.putStream(part, filename, headers, handleResponse);
    put.on('progress', handleProgress);
};
于 2012-11-13T00:58:01.623 に答える