python - 非常に大きな数を圧縮する (Python で)

Question

圧縮するには、非常に非常に大きな数 (1 億 4,300 万桁) が必要です。損失なしで少なくとも 10% に圧縮するソリューションを探しています。zlib、zipfile、gzip などを試しましたが、どれもこの数値を実際に圧縮することはできません。これは私が持っていたアイデアですが、問題はそれを実装する方法がわからないことです

まず、番号を持っています。

234512

次に、256 より小さい数字のブロックに分割する必要があります。

234,51,2

サイズが固定されている場合 (たとえば、常に 3 桁) に分割することもできますが、ブロックごとに 1、2、または 3 桁になる可能性があるため、ここで行き詰まりました。

そして、256 より小さい数字のブロックを取得したら、それらを文字に変換してファイルに書き込みます。

編集:その方法では着陸ゼロが失われるため、数値サイズの約 50% を圧縮するアルゴリズムを作成しました。

数字として 0 ～ 9 の数字しかないので、それらは 16 進数であり (そうではありませんが)、基数 10 に変換してサイズを小さくすることができます。編集 2: この手順をスキップします。実際、それを行うと、サイズが大きくなるだけです。
数字として 0 ～ 9 の数字を含む小さい数字を取得すると、それらが 16 進数であると再び想定できます。したがって、 unhexlify を使用して、サイズの半分である多くのバイトに変換します! (長さが奇数の場合は、番号の末尾に「a」を追加します)

コード：

if len(o)%2: o+='a' #avoid odd-length
return unhexlify(o)

そして、返されるデータは zlib で圧縮することもできます。合計45%の圧縮率。

score 1 · Accepted Answer

ここに行きます：

#! /usr/bin/python

n = 313105074639950943116 #just an example

#your algorithm
chars = []
buff = ''
s = str (n)
while s:
    if int (buff + s [0] ) < 256:
        buff += s [0]
        s = s [1:]
    else:
        chars.append (int (buff) )
        buff = ''
if buff: chars.append (int (buff) )

print ('You need to write these numbers converted to chars: {}'.format (chars) )
print ('This are {} bytes of data.'.format (len (chars) ) )
print ('But you cannot decompress it, because you lose leading zeros.')

chars = []
while n:
    chars.append (n & 0xff)
    n = n >> 8

print ('Now if you just write the number to a file without your algorithm:')
print ('You need to write these numbers converted to chars: {}'.format (chars) )
print ('This are {} bytes of data.'.format (len (chars) ) )
print ('And you can actually read it again.')

編集: 数値の 10 進数表現に 6 と 8 のシーケンスが多数含まれている場合は、10 進数表現の RLE を使用してみてください。おそらくハフマンツリーと組み合わせてください。

EDIT 2 : (a) 6 と 8 のロングラン、および (b) 確立されたアルゴリズムを使用したくないという事実を考慮すると、次のような非常に粗雑な RLE を使用できます。

#! /usr/bin/python

n = 313666666666666688888888888888888866666666666666666666666666666610507466666666666666666666666666399509431888888888888888888888888888888888888888888881666666666666

s = str (n)
print (s)
comp = ''
count = None
while s:
    if s [0] in '01234579':
        if count:
            comp += ('<{}>' if count [0] == 6 else '[{}]').format (count [1] )
            count = None
        comp += s [0]
    if s [0] == '6':
        if count and count [0] == 6: count = (6, count [1] + 1)
        elif count:
            comp += ('[{}]').format (count [1] )
            count = (6, 1)
        else: count = (6, 1)
    if s [0] == '8':
        if count and count [0] == 8: count = (8, count [1] + 1)
        elif count:
            comp += ('<{}>').format (count [1] )
            count = (8, 1)
        else: count = (8, 1)
    s = s [1:]

if count: comp += ('<{}>' if count [0] == 6 else '[{}]').format (count [1] )

print (comp)

python - 非常に大きな数を圧縮する (Python で)

1 に答える 1

Related

Reference