python - 辞書からnumpy配列への書き込み

Question

numpy 配列に書き込みたいファイルヘッダー値 (時間、フレーム数、年、月など) の辞書があります。私が現在持っているコードは次のとおりです。

    arr=np.array([(k,)+v for k,v in fileheader.iteritems()],dtype=["a3,a,i4,i4,i4,i4,f8,i4,i4,i4,i4,i4,i4,a10,a26,a33,a235,i4,i4,i4,i4,i4,i4"])

しかし、「タプル（「int」ではなく）をタプルに連結することしかできない」というエラーが表示されます。

基本的に、最終結果は、ファイルヘッダー情報全体 (512 バイト) と各フレームのデータ (ヘッダーとデータ、各フレームで 49408 バイト) を格納する配列である必要があります。これを行う簡単な方法はありますか？

編集：（私自身も）明確にするために、ファイルの各フレームのデータを配列に書き込む必要があります。ベースとしてmatlabコードが与えられました。与えられたコードの大まかなアイデアは次のとおりです。

data.frame=zeros([512 96])
frame=uint8(fread(fid,[data.numbeams,512]),'uint8'))
data.frame=frame

「フレーム」をPythonに変換するにはどうすればよいですか?

score 4 · Accepted Answer

ヘッダーデータをdictに保持する方がおそらく良いでしょう。本当に配列として必要ですか？（もしそうなら、なぜですか？numpy配列にヘッダーを含めることにはいくつかの利点がありますが、単純なものよりも複雑で、dict柔軟性がありません。）

aの欠点の1つdictは、キーの順序が予測できないことです。ヘッダーを通常の順序（C構造体と同様）でディスクに書き戻す必要がある場合は、フィールドの順序とその値を個別に保存する必要があります。その場合は、順序付けされたdict（collections.OrderedDict）を検討するか、単純なクラスをまとめてヘッダーデータを保持し、そこに順序を格納することを検討してください。

それをnumpy配列に入れる正当な理由がない限り、あなたはそうしたくないかもしれません。

ただし、構造化配列はヘッダーの順序を保持し、ヘッダーのバイナリ表現をディスクに書き込むのを容易にしますが、他の点では柔軟性がありません。

ヘッダーを配列にしたい場合は、次のようにします。

import numpy as np

# Lists can be modified, but preserve order. That's important in this case.
names = ['Name1', 'Name2', 'Name3']
# It's "S3" instead of "a3" for a string field in numpy, by the way
formats = ['S3', 'i4', 'f8'] 

# It's often cleaner to specify the dtype this way instead of as a giant string
dtype = dict(names=names, formats=formats)

# This won't preserve the order we're specifying things in!!
# If we iterate through it, things may be in any order.
header = dict(Name1='abc', Name2=456, Name3=3.45)

# Therefore, we'll be sure to pass things in in order...
# Also, np.array will expect a tuple instead of a list for a structured array...
values = tuple(header[name] for name in names)
header_array = np.array(values, dtype=dtype)

# We can access field in the array like this...
print header_array['Name2']

# And dump it to disk (similar to a C struct) with
header_array.tofile('test.dat')

一方、ヘッダーの値にアクセスしたいだけの場合は、それを。として保持しdictます。そうすればもっと簡単です。

あなたがしているように聞こえることに基づいて、私はこのようなことをします。ヘッダーを読み取るためにnumpy配列を使用していますが、ヘッダー値は実際にはクラス属性（およびヘッダー配列）として格納されています。

これは実際よりも複雑に見えます。

親ファイル用とフレーム用の2つの新しいクラスを定義しています。少し少ないコードで同じことを行うことができますが、これにより、より複雑なことの基盤が得られます。

import numpy as np

class SonarFile(object):
    # These define the format of the file header
    header_fields = ('num_frames', 'name1', 'name2', 'name3')
    header_formats = ('i4', 'f4', 'S10', '>I4')

    def __init__(self, filename):
        self.infile = open(filename, 'r')
        dtype = dict(names=self.header_fields, formats=self.header_formats)

        # Read in the header as a numpy array (count=1 is important here!)
        self.header = np.fromfile(self.infile, dtype=dtype, count=1)

        # Store the position so we can "rewind" to the end of the header
        self.header_length = self.infile.tell()

        # You may or may not want to do this (If the field names can have
        # spaces, it's a bad idea). It will allow you to access things with
        # sonar_file.Name1 instead of sonar_file.header['Name1'], though.
        for field in self.header_fields:
            setattr(self, field, self.header[field])

    # __iter__ is a special function that defines what should happen when we  
    # try to iterate through an instance of this class.
    def __iter__(self):
        """Iterate through each frame in the dataset."""
        # Rewind to the end of the file header
        self.infile.seek(self.header_length)

        # Iterate through frames...
        for _ in range(self.num_frames):
            yield Frame(self.infile)

    def close(self):
        self.infile.close()

class Frame(object):
    header_fields = ('width', 'height', 'name')
    header_formats = ('i4', 'i4', 'S20')
    data_format = 'f4'

    def __init__(self, infile):
        dtype = dict(names=self.header_fields, formats=self.header_formats)
        self.header = np.fromfile(infile, dtype=dtype, count=1)

        # See discussion above...
        for field in self.header_fields:
            setattr(self, field, self.header[field])

        # I'm assuming that the size of the frame is in the frame header...
        ncols, nrows = self.width, self.height

        # Read the data in
        self.data = np.fromfile(infile, self.data_format, count=ncols * nrows)

        # And reshape it into a 2d array.
        # I'm assuming C-order, instead of Fortran order.
        # If it's fortran order, just do "data.reshape((ncols, nrows)).T"
        self.data = self.data.reshape((nrows, ncols))

これと同じように使用します。

dataset = SonarFile('input.dat')

for frame in dataset:
    im = frame.data
    # Do something...

score 1 · Accepted Answer

問題vは、intではなくであるようtupleです。試す：

arr=np.array([(k,v) for k,v in fileheader.iteritems()],dtype=["a3,a,i4,i4,i4,i4,f8,i4,i4,i4,i4,i4,i4,a10,a26,a33,a235,i4,i4,i4,i4,i4,i4"])

python - 辞書からnumpy配列への書き込み

2 に答える 2

Related

Reference