python - Pythonにすべての同一の文字列が同じメモリを使用するようにするにはどうすればよいですか？

Question

重複の可能性：
Pythonインターンは何をしますか？いつ使用する必要がありますか？

配列上で数百万の文字列オブジェクトと相関させる必要のあるPythonのプログラムを使用しています。それらがすべて同じ引用符で囲まれた文字列からのものである場合、追加の「文字列」はそれぞれ、最初のマスター文字列への参照にすぎないことを発見しました。ただし、文字列がファイルから読み取られ、文字列がすべて等しい場合でも、それぞれに新しいメモリ割り当てが必要です。

つまり、これには約14メガのストレージが必要です。

a = ["foo" for a in range(0,1000000)]

これには65メガ以上のストレージが必要ですが：

a = ["foo".replace("o","1") for a in range(0,1000000)]

これで、メモリにかかるスペースを大幅に減らすことができます。

s = {"f11":"f11"}
a = [s["foo".replace("o","1")] for a in range(0,1000000)]

しかし、それはばかげているようです。これを行う簡単な方法はありますか？

score 14 · Accepted Answer

を実行するだけでintern()、Pythonに文字列を保存してメモリから取得するように指示します。

a = [intern("foo".replace("o","1")) for a in range(0,1000000)]

これも、最初の例と同じように、約18MBになります。

python3を使用する場合は、以下のコメントにも注意してください。Thx @Abe Karplus

score 0 · Accepted Answer

あなたはこのようなことを試すことができます：

strs=["this is string1","this is string2","this is string1","this is string2",
      "this is string3","this is string4","this is string5","this is string1",
      "this is string5"]
new_strs=[]
for x in strs:
    if x in new_strs:
        new_strs.append(new_strs[new_strs.index(x)]) #find the index of the string
                                                     #and instead of appending the
                                                #string itself, append it's reference.
    else:
        new_strs.append(x)

print [id(y) for y in new_strs]

同一の文字列は同じになりますid()

出力：

[18632400, 18632160, 18632400, 18632160, 18651400, 18651440, 18651360, 18632400, 18651360]

score -1 · Accepted Answer

見た文字列の辞書を保持することは機能するはずです

new_strs = []
str_record = {}
for x in strs:
    if x not in str_record:
        str_record[x] = x
    new_strs.append(str_record[x])

（テストされていません。）

python - Pythonにすべての同一の文字列が同じメモリを使用するようにするにはどうすればよいですか？

3 に答える 3

Related

Reference