python - Pythonでバイナリモードで大きなファイルをダウンロードするには?

Question

Pythonでダウンロード機能をコーディングしています。ファイルサイズが 1GB を超えています。サーバーは Linux、HTTP サーバーは Karrigell です。クライアントはブラウズ、Firefox または IE です。大きなトラブルに遭遇します。

最初に、sys.stdout() を使用してファイルコンテンツを送信します。

file = open(path, 'rb')
size = os.path.getsize(path)

RESPONSE['Pragma'] = 'public'
RESPONSE['Expires'] = '0'
RESPONSE['Cache-Control'] = 'must-revalidate, pre-check=0'
RESPONSE['Content-Disposition'] = 'attachment; filename="' + os.path.basename(path) + '"'
RESPONSE['Content-type'] = "application/octet-stream"
RESPONSE['Content-Transfer-Encoding'] = 'binary'
RESPONSE['Content-length'] = str(os.path.getsize(path))

sys.stdout.flush()
chunk_size = 10000
handle = open(path, "rb")
while True:
    buffer = handle.read(chunk_size)
    if buffer:
        STDOUT(buffer)
    else:
        break
sys.stdout.flush()

問題はサーバーのメモリ不足です! 私は、stdoutが最初にコンテンツをメモリに書き込み、次にメモリがソケットに送信されることを知っています。

ということで、関数を修正。コンテンツをソケットに直接送信します。py-sendfile モジュールを使用します。http://code.google.com/p/py-sendfile/

file = open(path, 'rb')
size = os.path.getsize(path)

sock = REQUEST_HANDLER.sock
sock.sendall("""HTTP/1.1 200 OK\r\nPragma: no-cache\r\nExpires: 0\r\nCache-Control: no-cache, no-store\r\nContent-Disposition: attachment; filename="%s"\r\nContent-Type: application/octet-stream\r\nContent-Length: %u\r\nContent-Range: bytes 0-4096/%u\r\nLocation: "%s"\r\n\r\n""" % (os.path.basename(path), size, size, os.path.basename(path)))

offset = 0
nbytes = 4096
while 1:
    try:
        sent = sendfile.sendfile(sock.fileno(), file.fileno(), offset, nbytes)
    except OSError, err:
        if err.errno in (errno.EAGAIN, errno.EBUSY):  # retry
            continue
        raise
    else:
        if sent == 0:
            break    # done
        offset += sent

今回は、サーバーメモリはOKですが、ブラウズダイ！ブラウズメモリが急速に上昇します。ソケットがファイルコンテンツ全体を受け入れるまで解放されません。

これらの問題に対処する方法がわかりません。2 番目のアイデアは正しいと思います。コンテンツをソケットに直接送信します。しかし、データを受け入れている間、ブラウズでメモリを解放できないのはなぜでしょうか?

score 1 · Accepted Answer

ファイルをチャンクでダウンロードしてみてください。これは、urllib2 を使用して動作する例です。

import os
import urllib2
import math

def downloadChunks(url):
    """Helper to download large files
        the only arg is a url
       this file will go to a temp directory
       the file will also be downloaded
       in chunks and print out how much remains
    """

    baseFile = os.path.basename(url)

    #move the file to a more uniq path
    os.umask(0002)
    temp_path = "/tmp/"
    try:
        file = os.path.join(temp_path,baseFile)

        req = urllib2.urlopen(url)
        total_size = int(req.info().getheader('Content-Length').strip())
        downloaded = 0
        CHUNK = 256 * 10240
        with open(file, 'wb') as fp:
            while True:
                chunk = req.read(CHUNK)
                downloaded += len(chunk)
                print math.floor( (downloaded / total_size) * 100 )
                if not chunk: break
                fp.write(chunk)
    except urllib2.HTTPError, e:
        print "HTTP Error:",e.code , url
        return False
    except urllib2.URLError, e:
        print "URL Error:",e.reason , url
        return False

    return file

python - Pythonでバイナリモードで大きなファイルをダウンロードするには?

1 に答える 1

Related

Reference