python - hdf5 ファイルを 1 つのデータセットに結合

Question

それぞれに単一のデータセットを持つ多数の hdf5 ファイルがあります。データがすべて同じボリュームにある 1 つのデータセットにそれらを結合したい (各ファイルは画像であり、1 つの大きなタイムラプス画像が必要です)。

データをnumpy配列として抽出して保存し、それを新しいh5ファイルに書き込もうとするPythonスクリプトを作成しました。ただし、結合されたデータは私が持っている 32 GB 以上の RAM を使用するため、このアプローチは機能しません。

コマンドラインツールのh5copyも使ってみました。

h5copy -i file1.h5 -o combined.h5 -s '/dataset' -d '/new_data/t1'
h5copy -i file2.h5 -o combined.h5 -s '/dataset' -d '/new_data/t2'

これは機能しますが、すべてのデータセットを連続させるのではなく、新しいファイル内に多くのデータセットが作成されます。

score 2 · Accepted Answer

hdf5 データセットに行を明示的に追加することはできませんが、新しいデータに対応するためにデータセットを「サイズ変更」できる方法でデータセットを作成するときに maxshape キーワードを有利に使用できます。( http://docs.h5py.org/en/latest/faq.html#appending-data-to-a-datasetを参照)

データセットの列数が常に同じであると仮定すると、コードは次のようになります。

import h5py

output_file = h5py.File('your_output_file.h5', 'w')

#keep track of the total number of rows
total_rows = 0

for n, f in enumerate(file_list):
  your_data = <get your data from f>
  total_rows = total_rows + your_data.shape[0]
  total_columns = your_data.shape[1]

  if n == 0:
    #first file; create the dummy dataset with no max shape
    create_dataset = output_file.create_dataset("Name", (total_rows, total_columns), maxshape=(None, None))
    #fill the first section of the dataset
    create_dataset[:,:] = your_data
    where_to_start_appending = total_rows

  else:
    #resize the dataset to accomodate the new data
    create_dataset.resize(total_rows, axis=0)
    create_dataset[where_to_start_appending:total_rows, :] = your_data
    where_to_start_appending = total_rows

output_file.close()

python - hdf5 ファイルを 1 つのデータセットに結合

1 に答える 1

Related

Reference