python - テキストを html エンティティ (タグではない) にエンコードする

Question

私は運がなくてもこれをたくさん探してきました。だから私はおそらく問題は、いくつかの概念が欠けているか、本当に必要なものを理解していないためだと思ったので、ここに問題があります:

私はpisaを使用してpdfを作成していますが、これは私が使用するコードです:

def write_to_pdf(template_data, context_dict, filename):
    template = Template(template_data)
    context = Context(context_dict)
    html = template.render(context)
    result = StringIO.StringIO()
    pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("UTF-8")), result, link_callback=fetch_resources)

    if not pdf.err:
        response = http.HttpResponse(mimetype='application/pdf')
        response['Content-Disposition'] = 'attachment; filename=%s.pdf' % filename
        response.write(result.getvalue())
        return response

    return http.HttpResponse('Problem creating PDF: %s' % cgi.escape(html))

したがって、この文字列を pdf にしようとすると、次のようになります。

template_data = 'テスト'

これは次のようになります (#文字ではなく黒い点であると考えてください)。

t##sting á

cgi.escape黒いスポットがまだそこにあり、htmlタグを印刷してしまうので、運が悪いので使用しようとしました。それはpython 2.7なのでhtml.escape、すべての問題を使用して解決することはできません。

したがって、既存の html タグに影響を与えることなく、通常のテキストを html エンティティに変換できるものが必要です。手がかりはありますか？

ああ、その行を変更すると:

pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("UTF-8")), result, link_callback=fetch_resources)

に

pdf = pisa.pisaDocument(html, result, link_callback=fetch_resources)

それは機能しますが、そこに配置される文字の種類が正確にわからず、ピサでサポートされない可能性があるため、必要な html エンティティを作成しません。

score 2 · Accepted Answer

名前付き HTML エンティティを Python でエンコードする

http://beckism.com/2009/03/named_entities_python/

デコードとエンコードの両方を行う django アプリもあります。

https://github.com/cobrateam/python-htmlentities

Python 2.x の場合 ( html.entities.codepoint2namePython 3.x ではに変更):

'''
Registers a special handler for named HTML entities

Usage:
import named_entities
text = u'Some string with Unicode characters'
text = text.encode('ascii', 'named_entities')
'''

import codecs
from htmlentitydefs import codepoint2name

def named_entities(text):
    if isinstance(text, (UnicodeEncodeError, UnicodeTranslateError)):
        s = []
        for c in text.object[text.start:text.end]:
            if ord(c) in codepoint2name:
                s.append(u'&%s;' % codepoint2name[ord(c)])
            else:
                s.append(u'&#%s;' % ord(c))
        return ''.join(s), text.end
    else:
        raise TypeError("Can't handle %s" % text.__name__)

codecs.register_error('named_entities', named_entities)

python - テキストを html エンティティ (タグではない) にエンコードする

1 に答える 1

Related

Reference