python - 文字列内の印刷できない文字を表示する

Question

16 進値を持つ Python 文字列で印刷できない文字を視覚化することは可能ですか?

たとえば、内部に改行を含む文字列がある場合、それをに置き換えたいと思います\x0a。

repr()... を与えるものがあることは知っています\nが、16進バージョンを探しています。

score 17 · Accepted Answer

組み込みの方法はわかりませんが、理解度を使用して行うのはかなり簡単です。

import string
printable = string.ascii_letters + string.digits + string.punctuation + ' '
def hex_escape(s):
    return ''.join(c if c in printable else r'\x{0:02x}'.format(ord(c)) for c in s)

score 10 · Accepted Answer

手動で翻訳する必要があります。たとえば、文字列を正規表現で調べ、各出現箇所を同等の16進数に置き換えます。

import re

replchars = re.compile(r'[\n\r]')
def replchars_to_hex(match):
    return r'\x{0:02x}'.format(ord(match.group()))

replchars.sub(replchars_to_hex, inputtext)

上記の例では、改行と改行のみが一致しますが、\xエスケープコードと範囲を使用するなど、一致する文字を拡張できます。

>>> inputtext = 'Some example containing a newline.\nRight there.\n'
>>> replchars.sub(replchars_to_hex, inputtext)
'Some example containing a newline.\\x0aRight there.\\x0a'
>>> print(replchars.sub(replchars_to_hex, inputtext))
Some example containing a newline.\x0aRight there.\x0a

score 2 · Accepted Answer

ecatmur のソリューションを印刷不可能な非 ASCII 文字を処理するように変更すると、より簡単ではなくなり、不快になります。

def escape(c):
    if c.printable():
        return c
    c = ord(c)
    if c <= 0xff:
        return r'\x{0:02x}'.format(c)
    elif c <= '\uffff':
        return r'\u{0:04x}'.format(c)
    else:
        return r'\U{0:08x}'.format(c)

def hex_escape(s):
    return ''.join(escape(c) for c in s)

もちろんstr.isprintable、必要な定義とまったく異なる場合は、別の関数を作成できます。(これは含まれているものとは非常に異なるセットであることに注意してくださいstring.printable— ASCII 以外の印刷可能および印刷不可能な文字を処理することに加えて、、、、、およびも印刷不可能と見なし\nます。\r\t\x0b\x0c

これをもっとコンパクトにすることができます。これは、Unicode 文字列の処理に関連するすべての手順を示すためだけに明示されています。例えば：

def escape(c):
    if c.printable():
        return c
    elif c <= '\xff':
        return r'\x{0:02x}'.format(ord(c))
    else:
        return c.encode('unicode_escape').decode('ascii')

本当に、何をするにしても、、、、および明示\r的に処理する必要があります。なぜなら、私が知っているすべての組み込み関数と stdlib 関数は、16 進バージョンではなく、これらの特別なシーケンスを介してそれらをエスケープするからです。\n\t

score 0 · Accepted Answer

私は、私が望むことをstrしたカスタムメソッドでサブクラスを派生させることで、一度似たようなことをしました。__repr__()それはまさにあなたが探しているものではありませんが、いくつかのアイデアを与えるかもしれません.

# -*- coding: iso-8859-1 -*-

# special string subclass to override the default
# representation method. main purpose is to
# prefer using double quotes and avoid hex
# representation on chars with an ord > 128
class MsgStr(str):
    def __repr__(self):
        # use double quotes unless there are more of them within the string than
        # single quotes
        if self.count("'") >= self.count('"'):
            quotechar = '"'
        else:
            quotechar = "'"

        rep = [quotechar]
        for ch in self:
            # control char?
            if ord(ch) < ord(' '):
                # remove the single quotes around the escaped representation
                rep += repr(str(ch)).strip("'")
            # embedded quote matching quotechar being used?
            elif ch == quotechar:
                rep += "\\"
                rep += ch
            # else just use others as they are
            else:
                rep += ch
        rep += quotechar

        return "".join(rep)

if __name__ == "__main__":
    s1 = '\tWürttemberg'
    s2 = MsgStr(s1)
    print "str    s1:", s1
    print "MsgStr s2:", s2
    print "--only the next two should differ--"
    print "repr(s1):", repr(s1), "# uses built-in string 'repr'"
    print "repr(s2):", repr(s2), "# uses custom MsgStr 'repr'"
    print "str(s1):", str(s1)
    print "str(s2):", str(s2)
    print "repr(str(s1)):", repr(str(s1))
    print "repr(str(s2)):", repr(str(s2))
    print "MsgStr(repr(MsgStr('\tWürttemberg'))):", MsgStr(repr(MsgStr('\tWürttemberg')))

python - 文字列内の印刷できない文字を表示する

6 に答える 6

Related

Reference