python - 辞書から (非常に長い) 文字列をエンコードする効率的な方法は何ですか? (パイソン)

Question

{「エンコードする文字」:「対応するバイナリコード」など} の形式で辞書を組み立てています。私はこのようにエンコードしています：

def encode(self, text): 
    encoded = ""
    def generator():
        for ch in text:
            yield self.codes[ch]  # Get the encoded representation from the dictionary
    return ''.join(generator())

これは短い文字列では問題なく機能しますが、小説の長さの文字列では非常に遅く、使用できません。このような文字列をエンコードするより高速な方法は何ですか? それとも、データの保存方法と操作方法を完全に再考する必要がありますか?

より多くのコード:

ここで、f は文字列 (これprint c.encode(f)を確認したところです)、c はエンコーダーオブジェクトです。これは短いファイルで機能します - 私は 3000 文字までテストしました。thg435 のおかげで、私のエンコード機能は現在

 def encode(self, text):
        return ''.join(map(self.codes.get,text))

self.codesはマッピングのディクショナリです。文字列 'hello' が入力されると、に設定され{'h': '01', 'e': '00', 'l': '10', 'o': '11'}ます。もっとコードを書く必要があるような気がしますが、引数 ('text') と辞書をテストしたので、この関数の実行時間に影響を与える可能性があるのはそれらだけであるように思われるため、何が関連するのかわかりません. エンコードの前に呼び出される関数は、速度の点で問題なく機能します。これは、print ステートメントを使用して出力をチェックしており、常に実行時から数秒以内に出力されるためです。

score 3 · Accepted Answer

これは最速のようです：

''.join(map(codes.get, text))

タイミング:

codes = {chr(n): '[%d]' % n for n in range(255)}


def encode1(text): 
    return ''.join(codes[c] for c in text)

def encode2(text): 
    import re
    return re.sub(r'.', lambda m: codes[m.group()], text)

def encode3(text): 
    return ''.join(map(codes.get, text))


import timeit

a = 'foobarbaz' * 1000

print timeit.timeit(lambda: encode1(a), number=100)
print timeit.timeit(lambda: encode2(a), number=100)
print timeit.timeit(lambda: encode3(a), number=100)


# 0.113456964493
# 0.445501089096
# 0.0811159610748

python - 辞書から (非常に長い) 文字列をエンコードする効率的な方法は何ですか? (パイソン)

1 に答える 1

Related

Reference