python - stringio オブジェクトをクリアするにはどうすればよいですか?

Question

stringio オブジェクトを作成しましたが、その中にテキストが含まれています。リコールするのではなく、既存の値をクリアして再利用したいと思います。とにかくこれを行うことはありますか？

score 117 · Accepted Answer

TL;DR

わざわざクリアする必要はありません。新しいものを作成するだけです。その方が高速です。

方法

パイソン 2

そのようなものを見つける方法は次のとおりです。

>>> from StringIO import StringIO
>>> dir(StringIO)
['__doc__', '__init__', '__iter__', '__module__', 'close', 'flush', 'getvalue', 'isatty', 'next', 'read', 'readline', 'readlines', 'seek', 'tell', 'truncate', 'write', 'writelines']
>>> help(StringIO.truncate)
Help on method truncate in module StringIO:

truncate(self, size=None) unbound StringIO.StringIO method
    Truncate the file's size.

    If the optional size argument is present, the file is truncated to
    (at most) that size. The size defaults to the current position.
    The current file position is not changed unless the position
    is beyond the new file size.

    If the specified size exceeds the file's current size, the
    file remains unchanged.

だから、あなたが欲しい.truncate(0)。しかし、新しい StringIO を初期化する方がおそらく安価 (かつ簡単) です。ベンチマークについては、以下を参照してください。

パイソン3

(違いを指摘してくれた tstone2077に感謝します。)

>>> from io import StringIO
>>> dir(StringIO)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', 'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush', 'getvalue', 'isatty', 'line_buffering', 'newlines', 'read', 'readable', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']
>>> help(StringIO.truncate)
Help on method_descriptor:

truncate(...)
    Truncate size to pos.

    The pos argument defaults to the current file position, as
    returned by tell().  The current file position is unchanged.
    Returns the new absolute position.

現在のファイル位置が変更されていないことに注意することが重要です。一方、サイズ 0 に切り詰めると、Python 2 バリアントでは位置がリセットされます。

したがって、Python 2 の場合、必要なのは

>>> from cStringIO import StringIO
>>> s = StringIO()
>>> s.write('foo')
>>> s.getvalue()
'foo'
>>> s.truncate(0)
>>> s.getvalue()
''
>>> s.write('bar')
>>> s.getvalue()
'bar'

Python 3 でこれを行うと、期待した結果が得られません。

>>> from io import StringIO
>>> s = StringIO()
>>> s.write('foo')
3
>>> s.getvalue()
'foo'
>>> s.truncate(0)
0
>>> s.getvalue()
''
>>> s.write('bar')
3
>>> s.getvalue()
'\x00\x00\x00bar'

したがって、Python 3 では、位置をリセットする必要もあります。

>>> from cStringIO import StringIO
>>> s = StringIO()
>>> s.write('foo')
3
>>> s.getvalue()
'foo'
>>> s.truncate(0)
0
>>> s.seek(0)
0
>>> s.getvalue()
''
>>> s.write('bar')
3
>>> s.getvalue()
'bar'

Python 2 コードでメソッドを使用する場合は、必然的に Python 3 に移植するときにコードが壊れないように、同時にtruncate呼び出した方が安全ですseek(0)(前か後かは関係ありません)。新しいStringIOオブジェクトを作成するだけです。

タイムズ

パイソン 2

>>> from timeit import timeit
>>> def truncate(sio):
...     sio.truncate(0)
...     return sio
... 
>>> def new(sio):
...     return StringIO()
...

空の場合、StringIO を使用:

>>> from StringIO import StringIO
>>> timeit(lambda: truncate(StringIO()))
3.5194039344787598
>>> timeit(lambda: new(StringIO()))
3.6533868312835693

3KB のデータを入力し、StringIO を使用:

>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
4.3437709808349609
>>> timeit(lambda: new(StringIO('abc' * 1000)))
4.7179079055786133

cStringIO についても同様です。

>>> from cStringIO import StringIO
>>> timeit(lambda: truncate(StringIO()))
0.55461597442626953
>>> timeit(lambda: new(StringIO()))
0.51241087913513184
>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
1.0958449840545654
>>> timeit(lambda: new(StringIO('abc' * 1000)))
0.98760509490966797

したがって、潜在的なメモリの問題 ( del oldstringio) を無視すると、a を切り詰める方がStringIO.StringIO高速 (空の場合は 3%、3KB のデータの場合は 8% 高速) ですが、新しいcStringIO.StringIO(空の場合は 8% 高速、 3KB のデータで 10% 高速化)。したがって、最も簡単なものを使用することをお勧めします。つまり、CPython を使用していると仮定して、cStringIO新しいものを使用して作成してください。

パイソン3

同じコードをseek(0)入れただけです。

>>> def truncate(sio):
...     sio.truncate(0)
...     sio.seek(0)
...     return sio
... 
>>> def new(sio):
...     return StringIO()
...

空の場合:

>>> from io import StringIO
>>> timeit(lambda: truncate(StringIO()))
0.9706327870007954
>>> timeit(lambda: new(StringIO()))
0.8734330690022034

3KB のデータがある場合:

>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
3.5271066290006274
>>> timeit(lambda: new(StringIO('abc' * 1000)))
3.3496507499985455

そのため、Python 3 では、空のものを再利用する代わりに新しいものを作成する方が 11% 速く、3K のものを再利用する代わりに新しいものを作成する方が 5% 高速です。StringIO繰り返しますが、切り詰めてシークするのではなく、新しいものを作成してください。

score 2 · Accepted Answer

シーケンス内の多くのファイルの処理 (チャンクを読み取り、各チャンクを処理し、処理されたストリームをファイルに書き込む) を最適化する方法は、同じcStringIO.StringIOインスタンスを再利用することですが、常にreset()使用後にそれを使用し、次にそれに書き込み、次にtruncate(). こうすることで、現在のファイルに必要のない最後の部分だけを切り捨てています。これにより、パフォーマンスが最大 3% 向上したようです。これに詳しい人なら誰でも、これが実際にメモリ割り当てを最適化するかどうかを確認できます。

sio = cStringIO.StringIO()
for file in files:
    read_file_chunks_and_write_to_sio(file, sio)
    sio.truncate()
    with open('out.bla', 'w') as f:
        f.write(sio.getvalue())
    sio.reset()

python - stringio オブジェクトをクリアするにはどうすればよいですか?

3 に答える 3

TL;DR

方法

パイソン 2

パイソン3

タイムズ

パイソン 2

パイソン3

Related

Reference