matlab - Matlab:スパース構造を使用して2進数で1を合計する高速な方法は?

Question

ほとんどの回答は、ハミングの重みに関する既に回答された質問にのみ対処していますがfind、スパース性に関するポイントと対処は無視しています。どうやらここでのShaiの回答は、検索に関するポイントに対処していますが、まだ確認できていません。ここでの私の答えは、ビットシフトなどの他の答えの創意工夫を利用していませんが、十分な例の答えです。

入力

>> mlf=sparse([],[],[],2^31+1,1);mlf(1)=10;mlf(10)=111;mlf(77)=1010;  
>> transpose(dec2bin(find(mlf)))

ans =

001
000
000
011
001
010
101

ゴール

スパース構造を持つ 2 進数の 1 の量を高速に計算しますか?

score 6 · Accepted Answer

これはさまざまな方法で行うことができます。私が思う最も簡単なのは

% Example data
F = [268469248 285213696 536904704 553649152];

% Solution 1
sum(dec2bin(F)-'0',2)

そして最速（ここにあるように）：

% Solution 2
w = uint32(F');

p1 = uint32(1431655765);
p2 = uint32(858993459);
p3 = uint32(252645135);
p4 = uint32(16711935);
p5 = uint32(65535);

w = bitand(bitshift(w, -1), p1) + bitand(w, p1);
w = bitand(bitshift(w, -2), p2) + bitand(w, p2);
w = bitand(bitshift(w, -4), p3) + bitand(w, p3);
w = bitand(bitshift(w, -8), p4) + bitand(w, p4);
w = bitand(bitshift(w,-16), p5) + bitand(w, p5);

score 5 · Accepted Answer

あなたのコメントによると、を使用して数値のベクトルをバイナリ文字列表現に変換しますdec2bin。次に、次のように目的を達成できます。ここでは、ベクトル[10 11 12]を例として使用しています。

>> sum(dec2bin([10 11 12])=='1',2)

ans =

     2
     3
     2

または同等に、

>> sum(dec2bin([10 11 12])-'0',2)

速度のために、次のように避けることができますdec2bin（コードに触発されたモジュロ2演算を使用しdec2binます）：

>> sum(rem(floor(bsxfun(@times, [10 11 12].', pow2(1-N:0))),2),2)

ans =

     2
     3
     2

ここNで、期待する 2 進数の最大数です。

score 4 · Accepted Answer

本当に速くしたい場合は、ルックアップテーブルが便利だと思います。0..255 の数を簡単にマッピングできます。これを1回実行すると、intをそのバイトに分解するだけで、テーブルの合計を調べて結果を追加するだけです-文字列に移動する必要はありません...

例：

>> LUT = sum(dec2bin(0:255)-'0',2); % construct the look up table (only once)
>> ii = uint32( find( mlf ) ); % get the numbers
>> vals = LUT( mod( ii, 256 ) + 1 ) + ... % lower bytes
          LUT( mod( ii/256, 256 ) + 1 ) + ...
          LUT( mod( ii/65536, 256 ) + 1 ) + ...
          LUT( mod( ii/16777216, 256 ) + 1 );

使用（ Amrotypecastの提案による）：

>> vals = sum( reshape(LUT(double(typecast(ii,'uint8'))+1), 4, [] ), 1 )';

実行時間の比較

>> ii = uint32(randi(intmax('uint32'),100000,1));
>> tic; vals1 = sum( reshape(LUT(typecast(ii,'uint8')+1), 4, [] ), 1 )'; toc, %//'
>> tic; vals2 = sum(dec2bin(ii)-'0',2); toc
>> dii = double(ii); % type issues
>> tic; vals3 = sum(rem(floor(bsxfun(@times, dii, pow2(1-32:0))),2),2); toc

結果：

Elapsed time is 0.006144 seconds.  <-- this answer
Elapsed time is 0.120216 seconds.  <-- using dec2bin
Elapsed time is 0.118009 seconds.  <-- using rem and bsxfun

score 3 · Accepted Answer

@Shaiのルックアップテーブルの使用方法を示す例を次に示します。

% build lookup table for 8-bit integers
lut = sum(dec2bin(0:255)-'0', 2);

% get indices
idx = find(mlf);

% break indices into 8-bit integers and apply LUT
nbits = lut(double(typecast(uint32(idx),'uint8')) + 1);

% sum number of bits in each
s = sum(reshape(nbits,4,[]))

uint6432 ビット範囲外の大きなインデックスを持つ非常に大きなスパース配列がある場合は、代わりに切り替える必要があるかもしれません..

編集：

Java を使用する別のソリューションを次に示します。

idx = find(mlf);
s = arrayfun(@java.lang.Integer.bitCount, idx);

編集＃2：

これは、C++ MEX 関数として実装されたさらに別のソリューションです。それは依存していstd::bitset::countます：

bitset_count.cpp

#include "mex.h"
#include <bitset>

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    // validate input/output arguments
    if (nrhs != 1) {
        mexErrMsgTxt("One input argument required.");
    }
    if (!mxIsUint32(prhs[0]) || mxIsComplex(prhs[0]) || mxIsSparse(prhs[0])) {
        mexErrMsgTxt("Input must be a 32-bit integer dense matrix.");
    }
    if (nlhs > 1) {
        mexErrMsgTxt("Too many output arguments.");
    }

    // create output array
    mwSize N = mxGetNumberOfElements(prhs[0]);
    plhs[0] = mxCreateDoubleMatrix(N, 1, mxREAL);

    // get pointers to data
    double *counts = mxGetPr(plhs[0]);
    uint32_T *idx = reinterpret_cast<uint32_T*>(mxGetData(prhs[0]));

    // count bits set for each 32-bit integer number
    for(mwSize i=0; i<N; i++) {
        std::bitset<32> bs(idx[i]);
        counts[i] = bs.count();
    }
}

上記の関数をとしてコンパイルし、mex -largeArrayDims bitset_count.cpp通常どおり実行します。

idx = find(mlf);
s = bitset_count(uint32(idx))

これまでに述べたすべてのソリューションを比較することにしました。

function [t,v] = testBitsetCount()
    % random data (uint32 vector)
    x = randi(intmax('uint32'), [1e5,1], 'uint32');

    % build lookup table (done once)
    LUT = sum(dec2bin(0:255,8)-'0', 2);

    % functions to compare
    f = {
        @() bit_twiddling(x)      % bit twiddling method
        @() lookup_table(x,LUT);  % lookup table method
        @() bitset_count(x);      % MEX-function (std::bitset::count)
        @() dec_to_bin(x);        % dec2bin
        @() java_bitcount(x);     % Java Integer.bitCount
    };

    % compare timings and check results are valid
    t = cellfun(@timeit, f, 'UniformOutput',true);
    v = cellfun(@feval, f, 'UniformOutput',false);
    assert(isequal(v{:}));
end

function s = lookup_table(x,LUT)
    s = sum(reshape(LUT(double(typecast(x,'uint8'))+1),4,[]))';
end

function s = dec_to_bin(x)
    s = sum(dec2bin(x,32)-'0', 2);
end

function s = java_bitcount(x)
    s = arrayfun(@java.lang.Integer.bitCount, x);
end

function s = bit_twiddling(x)
    p1 = uint32(1431655765);
    p2 = uint32(858993459);
    p3 = uint32(252645135);
    p4 = uint32(16711935);
    p5 = uint32(65535);

    s = x;
    s = bitand(bitshift(s, -1), p1) + bitand(s, p1);
    s = bitand(bitshift(s, -2), p2) + bitand(s, p2);
    s = bitand(bitshift(s, -4), p3) + bitand(s, p3);
    s = bitand(bitshift(s, -8), p4) + bitand(s, p4);
    s = bitand(bitshift(s,-16), p5) + bitand(s, p5);
end

秒単位の経過時間:

t = 
    0.0009    % bit twiddling method
    0.0087    % lookup table method
    0.0134    % C++ std::bitset::count
    0.1946    % MATLAB dec2bin
    0.2343    % Java Integer.bitCount

score 0 · Accepted Answer

ビットカウントFEX コントリビューションは、ルックアップテーブルアプローチに基づくソリューションを提供しますが、より最適化されています。私の古いラップトップで R2015a を使用すると、100 万個の uint32 ベクトルで、ビット操作法 (Amro によって報告された最速の純粋な MATLAB 法) の 2 倍以上の速度で実行されます。

score 0 · Accepted Answer

メク！

このコードを次の名前で保存しますcountTransBits.cpp。

#include "mex.h"

void mexFunction( int nout, mxArray* pout[], int nin, mxArray* pin[] ) {
mxAssert( nin == 1 && mxIsSparse(pin[0]) && mxGetN( pin[0] ) == 1,
            "expecting single sparse column vector input" );
    mxAssert( nout == 1, "expecting single output" );

    // set output, assuming 32 bits, set to 64 if needed
    pout[0] = mxCreateNumericMatrix( 32, 1, mxUINT32_CLASS, mxREAL );
    unsigned int* counter = (unsigned int*)mxGetData( pout[0] );
    for ( int i = 0; i < 32; i++ ) {
        counter[i] = 0;
    }

    // start working
    mwIndex *pIr = mxGetIr( pin[0] );
    mwIndex* pJc = mxGetJc( pin[0] );
    double*  pr = mxGetPr( pin[0] );
    for ( mwSize i = pJc[0]; i < pJc[1]; i++ ) {
        if ( pr[i] != 0 ) {// make sure entry is non-zero
            unsigned int entry = pIr[i] + 1; // cast to unsigned int and add 1 for 1-based indexing in Matlab
            int bit = 0;
            while ( entry != 0 && bit < 32 ) {
                counter[bit] += ( entry & 0x1 ); // count the lsb
                bit++;
                entry >>= 1; // shift right
            }
        }
    }
}

Matlabでコンパイルする

>> mex -largeArrayDims -O countTransBits.cpp

コードを実行する

>> countTransBits( mlf )

lsb から msb までの 32 個のビンの出力カウントに注意してください。

score 0 · Accepted Answer

これにより、スパース構造からの 2 進数の行合計が得られます。

>> mlf=sparse([],[],[],2^31+1,1);mlf(1)=10;mlf(10)=111;mlf(77)=1010;  
>> transpose(dec2bin(find(mlf)))

ans =

001
000
000
011
001
010
101

>> sum(ismember(transpose(dec2bin(find(mlf))),'1'),2)

ans =

     1
     0
     0
     2
     1
     1
     2

誰かがより高速なrowsummationを見つけられることを願っています!

matlab - Matlab:スパース構造を使用して2進数で1を合計する高速な方法は?

7 に答える 7

編集：

編集＃2：

bitset_count.cpp

Related

Reference