python - Django/PythonでPDF変換を最適化する

Question

レポートをPDFでエクスポートするWebアプリがあります。クエリが100未満の値を返す場合は、すべて問題ありません。レコード数が100を超えると、サーバーは502プロキシエラーを発生させます。レポートはHTMLで正常に出力されます。サーバーをハングアップさせるプロセスは、htmlからPDFへの変換です。私はPDFを生成するためにxhtml2pdf（別名pisa 3.0）を使用しています。アルゴリズムは次のようなものです。

def view1(request, **someargs):
    queryset = someModel.objects.get(someargs)
    if request.GET['pdf']:
        return pdfWrapper('template.html',queryset,'filename')
    else:
        return render_to_response('template.html',queryset)

def pdfWrapper(template_src, context_dict, filename):
    ################################################
    #
    # The code comented below is an older version
    # I updated the code according the comment recived
    # The function still works for short HTML documents
    # and produce the 502 for larger onese
    #
    ################################################

    ##import cStringIO as StringIO
    import ho.pisa as pisa
    from django.template.loader import get_template
    from django.template import Context
    from django.http import HttpResponse
    ##from cgi import escape

    template = get_template(template_src)
    context = Context(context_dict)
    html  = template.render(context)

    response = HttpResponse()
    response['Content-Type'] ='application/pdf'
    response['Content-Disposition']='attachment; filename=%s.pdf'%(filename)

    pisa.CreatePDF(
        src=html,
        dest=response,
        show_error_as_pdf=True)

    return response

    ##result = StringIO.StringIO()
    ##pdf = pisa.pisaDocument(
    ##            StringIO.StringIO(html.encode("ISO-8859-1")),
    ##            result)
    ##if not pdf.err:
    ##    response = HttpResponse(
    ##                   result.getvalue(), 
    ##                   mimetype='application/pdf')
    ##    response['Content-Disposition']='attachement; filename=%s.pdf'%(filename)
    ##    return response
    ##return HttpResponse('Hubo un error<pre>%s</pre>' % escape(html))

サーバーがメモリを解放できるようにバッファを作成することを考えましたが、まだ何も見つかりませんでした。誰でも助けることができますか？お願いします？

score 3 · Accepted Answer

問題の原因を正確に伝えることはできません。StringIOのバッファリングの問題が原因である可能性があります。

ただし、このコードが実際に生成されたPDFデータをストリーミングすると仮定した場合は間違っています。StringIO.getvalue（）は、出力ストリームではなく、このメソッドが呼び出されたときに文字列バッファーのコンテンツを返します（http：// docsを参照）。 .python.org / library / stringio.html＃StringIO.StringIO.getvalue）。

出力をストリーミングする場合は、HttpResponseインスタンスをファイルのようなオブジェクトとして扱うことができます（http://docs.djangoproject.com/en/1.2/ref/request-response/#usageを参照）。

次に、ここでStringIOを使用する理由はわかりません。私が見つけたPisaのドキュメントによると（ちなみに、この関数はCreatePDFと呼ばれます）、ソースは文字列またはUnicodeオブジェクトにすることができます。

個人的には、次のことを試してみます。

HTMLをUnicode文字列として作成します
HttpResponseオブジェクトを作成して構成します
文字列を入力として、応答を出力としてPDFジェネレーターを呼び出します

アウトラインでは、これは次のようになります。

html = template.render(context)

response = HttpResponse()
response['Content-Type'] ='application/pdf'
response['Content-Disposition']='attachment; filename=%s.pdf'%(filename)

pisa.CreatePDF(
    src=html,
    dest=response,
    show_error_as_pdf=True)

#response.flush()
return response

しかし、これが実際に機能するかどうかは試しませんでした。（これまでのところ、この種のPDFストリーミングはJavaでのみ行っていました。）

更新： HttpResponseの実装を見たところです。書き込まれた文字列のチャンクをリストに収集することにより、ファイルインターフェイスを実装します。response.flush（）を呼び出しても意味がありません。何もしないからです。また、応答がファイルオブジェクトとしてアクセスされた後でも、Content-Typeなどの応答パラメーターを設定できます。

元の問題は、StringIOオブジェクトを閉じたことがないという事実にも関連している可能性があります。StringIOオブジェクトの基になるバッファは、close（）が呼び出される前に解放されません。

python - Django/PythonでPDF変換を最適化する

1 に答える 1

Related

Reference