python - numpy 配列を反復処理し、基準を指定して 1 つまたは 2 つの値を選択的に選択する

Question

任意のデータを含む、このような numpy 配列が与えられた場合:

>>> data
array([  1,   172,   32, ..., 42, 189, 29], dtype=int8) # SIGNED int8

...次のようにnumpy配列「結果」を構築する必要があります。

（疑似コードの実装を許してください。その方法を知っていれば、質問しません。機能するnumpyの実装があれば、代わりにCodeReviewに質問を送信します。）

for value in data, check:
    if value & 0x01:
        result.append((value >> 1 << 8) + next(value).astype(numpy.uint8))
        # that is: take TWO values from 'data', one signed, the next un-signed, glue them together, appending ONE int16 to result
    else:
        result.append(value >> 1)
        # that is: take ONE value from 'data', appending ONE int8 to result

これはすでに「プレーンな」Python で実装しています。それは問題なく動作しますが、numpy とその非常に効率的な配列操作を使用して最適化できることを願っています。リストと追加を取り除きたいです。悲しいことに、私はそれを達成する方法がわかりません:

# data is a string of 'bytes' received from a device
def unpack(data):
    l = len(data)
    p = 0
    result = []

    while p < l:
        i1 = (((ord(data[p]) + 128) % 256) - 128)
        p += 1
        if i1 & 0x01:
            # read next 'char' as an uint8
            #
            # due to the nature of the protocol,
            # we will always have sufficient data
            # available to avoid reading past the end
            i2 = ord(data[p])
            p += 1
            result.append((i1 >> 1 << 8) + i2)
        else:
            result.append(i1 >> 1)

    return result

更新: @Jaime のおかげで、効率的なアンパック機能を実装できました。少し速いですが、彼と非常に似ています。while ループはもちろん重要な部分です。誰かが興味を持っている場合に備えて、ここに投稿します：

def new_np_unpack(data):
    mask = (data & 0x01).astype(numpy.bool)

    true_positives = None

    while True:
        # check for 'true positives' in the tentative mask
        # the next item must by definition be a false one
        true_positives = numpy.nonzero(numpy.logical_and(mask, numpy.invert(numpy.concatenate(([False], mask[:-1])))))[0]

        # loop until no more 'false positives'
        if not numpy.any(mask[true_positives+1]):
            break

        mask[true_positives+1] = False

    result = numpy.empty(data.shape, dtype='int16')
    result[:] = data.astype('int8') >> 1
    result[true_positives] = (result[true_positives] << 8) + data[true_positives + 1]
    mask = numpy.ones(data.shape, dtype=bool)
    mask[true_positives + 1] = False
    return result[mask]

score 1 · Accepted Answer

ベクトル化されたものが機能しました。比較のために、私はord(...)あなたのコードから取り出し、次のようなデータを与えました:

data = np.random.randint(256, size=(1000000,)).astype('uint8')
data[-1] = 0 # to avoid errors with last element

あなたの関数の私のバージョン:

def np_unpack(data) :
    # find where condition is met
    mask = (data & 0x01).astype(bool)
    # Keep only 1st, 3rd, 5th... consecutive occurrences of True in mask
    new_mask = mask[:]
    mult = -1
    while new_mask.sum() :
        new_mask = np.logical_and(new_mask,
                                  np.concatenate(([False], new_mask[:-1])))
        mask +=  new_mask * mult
        mult *= -1
    del new_mask
    cond = np.nonzero(mask)[0]
    result = np.empty(data.shape, dtype='int16')
    result[:] = data.astype('int8') >> 1
    result[cond] <<= 8
    result[cond] += data[cond + 1]
    mask = np.ones(data.shape, dtype=bool)
    mask[cond + 1] = False
    return result[mask]

1M 要素のリストを使用したいくつかのテスト:

In [4]: np.all(unpack(data) == np_unpack(data))
Out[4]: True

In [5]: %timeit unpack(data)
1 loops, best of 3: 7.11 s per loop

In [6]: %timeit np_unpack(data)
1 loops, best of 3: 294 ms per loop

python - numpy 配列を反復処理し、基準を指定して 1 つまたは 2 つの値を選択的に選択する

1 に答える 1

Related

Reference