python - numpy 疎行列を HDF5 (PyTables) に格納する

Question

PyTables で numpy csr_matrix を保存する際に問題があります。次のエラーが表示されます。

TypeError: objects of type ``csr_matrix`` are not supported in this context, sorry; supported objects are: NumPy array, record or scalar; homogeneous list or tuple, integer, float, complex or string

私のコード:

f = tables.openFile(path,'w')

atom = tables.Atom.from_dtype(self.count_vector.dtype)
ds = f.createCArray(f.root, 'count', atom, self.count_vector.shape)
ds[:] = self.count_vector
f.close()

何か案は？

ありがとう

score 35 · Accepted Answer

DaveP による答えはほぼ正しいですが、非常にまばらな行列で問題が発生する可能性があります。最後の列または行が空の場合、それらは削除されます。したがって、すべてが機能することを確認するには、「形状」属性も保存する必要があります。

これは私が定期的に使用するコードです:

import tables as tb
from numpy import array
from scipy import sparse

def store_sparse_mat(m, name, store='store.h5'):
    msg = "This code only works for csr matrices"
    assert(m.__class__ == sparse.csr.csr_matrix), msg
    with tb.openFile(store,'a') as f:
        for par in ('data', 'indices', 'indptr', 'shape'):
            full_name = '%s_%s' % (name, par)
            try:
                n = getattr(f.root, full_name)
                n._f_remove()
            except AttributeError:
                pass

            arr = array(getattr(m, par))
            atom = tb.Atom.from_dtype(arr.dtype)
            ds = f.createCArray(f.root, full_name, atom, arr.shape)
            ds[:] = arr

def load_sparse_mat(name, store='store.h5'):
    with tb.openFile(store) as f:
        pars = []
        for par in ('data', 'indices', 'indptr', 'shape'):
            pars.append(getattr(f.root, '%s_%s' % (name, par)).read())
    m = sparse.csr_matrix(tuple(pars[:3]), shape=pars[3])
    return m

それをcsc行列に適応させるのは簡単です。

score 23 · Accepted Answer

dataCSR マトリックスは、、indicesおよびindptr属性から完全に再構築できます。これらは単なる通常の numpy 配列であるため、それらを 3 つの個別の配列として pytables に格納し、それらをのコンストラクターに戻すことに問題はありませんcsr_matrix。scipy docsを参照してください。

編集：shapeピエトロの答えは、メンバーも保存する必要があることを指摘しています

python - numpy 疎行列を HDF5 (PyTables) に格納する

3 に答える 3

Related

Reference