matlab - MATLAB での反復推定 (現在はループ) の最適化

Question

画像内のすべてのピクセルに対して最小二乗 (または同様の行列ベースの演算) を実行する必要があることに気付きました。すべてのピクセルには一連の数値が関連付けられているため、3D マトリックスとして配置できます。

(この次のビットはスキップできます)

最小二乗推定の意味の簡単な説明:

Y = Ax^2 + Bx + C でモデル化された二次システムがあり、それらの A、B、C 係数を探しているとします。X と対応する Y のいくつかのサンプル (少なくとも 3 つ) を使用して、次のように推定できます。

(10 としましょう) X サンプルを次のような行列に配置します。X = [x(:).^2 x(:) ones(10,1)];
Y サンプルを同様の行列に配置します。Y = y(:);
次を解くことにより、係数 A、B、C を推定します。coeffs = (X'*X)^(-1)*X'*Y;

必要に応じて、これを自分で試してください。

A = 5; B = 2; C = 1;
x = 1:10;
y = A*x(:).^2 + B*x(:) + C + .25*randn(10,1); % added some noise here
X = [x(:).^2 x(:) ones(10,1)];
Y = y(:);
coeffs = (X'*X)^-1*X'*Y

coeffs =

  5.0040
  1.9818
  0.9241

私があなたをそこに失ったら、もう一度注意を払い始めてください

*大幅な書き直し* 私が抱えている実際の問題にできるだけ近づけるために修正しましたが、それでも最小限の作業例にしています。

問題設定

%// Setup
xdim = 500; 
ydim = 500; 
ncoils = 8; 
nshots = 4; 
%// matrix size for each pixel is ncoils x nshots (an overdetermined system)

%// each pixel has a matrix stored in the 3rd and 4rth dimensions
regressor = randn(xdim,ydim, ncoils,nshots); 
regressand = randn(xdim, ydim,ncoils);

したがって、私の問題は、画像内のすべてのピクセルに対して (X'*X)^-1*X'*Y (最小二乗または同様の) 操作を実行する必要があることです。それ自体はベクトル化/マトリックス化されていますが、すべてのピクセルに対してそれを行う必要がある唯一の方法は、次のような for ループです。

オリジナルコードスタイル

%// Actual work
tic 
estimate = zeros(xdim,ydim);
for col=1:size(regressor,2)
    for row=1:size(regressor,1)

        X = squeeze(regressor(row,col,:,:));
        Y = squeeze(regressand(row,col,:));

        B = X\Y; 
        % B = (X'*X)^(-1)*X'*Y; %// equivalently

        estimate(row,col) = B(1);
    end
end
toc

Elapsed time = 27.6 seconds

コメントやその他のアイデア
に応じた編集いくつか試してみました:
1. 長いベクトルに形状を変更し、二重forループを削除しました。これにより、時間を節約できました。
2.事前に画像squeezeを -ing して (およびインライン転置) を削除しましpermuteた。これにより、時間を大幅に節約できます。

現在の例:

%// Actual work
tic 
estimate2 = zeros(xdim*ydim,1);
regressor_mod = permute(regressor,[3 4 1 2]);
regressor_mod = reshape(regressor_mod,[ncoils,nshots,xdim*ydim]);
regressand_mod = permute(regressand,[3 1 2]);
regressand_mod = reshape(regressand_mod,[ncoils,xdim*ydim]);

for ind=1:size(regressor_mod,3) % for every pixel

    X = regressor_mod(:,:,ind);
    Y = regressand_mod(:,ind);

    B = X\Y;

    estimate2(ind) = B(1);

end
estimate2 = reshape(estimate2,[xdim,ydim]);
toc

Elapsed time = 2.30 seconds (avg of 10)
isequal(estimate2,estimate) == 1;

ロディ・オルデンハウスのやり方

N  = xdim*ydim*ncoils;  %// number of columns
M  = xdim*ydim*nshots;    %// number of rows

ii = repmat(reshape(1:N,[ncoils,xdim*ydim]),[nshots 1]); %//column indicies
jj = repmat(1:M,[ncoils 1]); %//row indicies

X = sparse(ii(:),jj(:),regressor_mod(:));
Y = regressand_mod(:);

B = X\Y;

B = reshape(B(1:nshots:end),[xdim ydim]);

Elapsed time = 2.26 seconds (avg of 10) 
            or 2.18 seconds (if you don't include the definition of N,M,ii,jj)

質問は次のとおりです:
(さらに) 高速な方法はありますか?

（私はそうは思わない。）

score 6 · Accepted Answer

X の転置を事前に計算することにより、~2 倍の速度向上を実現できます。

for x=1:size(picture,2) % second dimension b/c already transposed

    X = picture(:,x);
    XX = X';
    Y = randn(n_timepoints,1);
    %B = (X'*X)^-1*X'*Y; ;
    B = (XX*X)^-1*XX*Y; 
    est(x) = B(1);

end

Before: Elapsed time is 2.520944 seconds.
After: Elapsed time is 1.134081 seconds.

EDIT：最新の編集でのコードは、次のように置き換えることができます

tic
xdim = 500; 
ydim = 500; 
n_timepoints = 10; % for example

% Actual work
picture = randn(xdim,ydim,n_timepoints);
picture = reshape(picture, [xdim*ydim,n_timepoints])'; % note transpose
YR = randn(n_timepoints,size(picture,2));

% (XX*X).^-1 = sum(picture.*picture).^-1;
% XX*Y = sum(picture.*YR);
est = sum(picture.*picture).^-1 .* sum(picture.*YR);

est = reshape(est,[xdim,ydim]);
toc

Elapsed time is 0.127014 seconds.

これは、最新の編集で桁違いに高速化されており、結果は以前の方法とほぼ同じです。

EDIT2：

X がベクトルではなく行列の場合、状況はもう少し複雑になります。基本的に、コストを抑えるために、for ループの外で可能な限り事前計算を行いたいと考えています。また、手動で計算することで大幅な高速化を実現できますXT*X。結果は常に対称行列になるため、いくつかのコーナーをカットして高速化することができます。まず、対称乗算関数:

function XTX = sym_mult(X) % X is a 3-d matrix

n = size(X,2);
XTX = zeros(n,n,size(X,3));
for i=1:n
    for j=i:n
        XTX(i,j,:) = sum(X(:,i,:).*X(:,j,:));
        if i~=j
            XTX(j,i,:) = XTX(i,j,:);
        end
    end
end

実際の計算スクリプト

xdim = 500; 
ydim = 500; 
n_timepoints = 10; % for example

Y = randn(10,xdim*ydim);
picture = randn(xdim,ydim,n_timepoints); % 500x500x10

% Actual work  
tic  % start timing

picture = reshape(picture, [xdim*ydim,n_timepoints])';

% Here we precompute the (XT*Y) calculation to speed things up later
picture_y = [sum(Y);sum(Y.*picture)]; 

% initialize
est = zeros(size(picture,2),1); 

picture = permute(picture,[1,3,2]);
XTX = cat(2,ones(n_timepoints,1,size(picture,3)),picture);
XTX = sym_mult(XTX); % precompute (XT*X) for speed

X = zeros(2,2); % preallocate for speed
XY = zeros(2,1);

for x=1:size(picture,2) % second dimension b/c already transposed

    %For some reason this is a lot faster than X = XTX(:,:,x);
    X(1,1) = XTX(1,1,x);
    X(2,1) = XTX(2,1,x);
    X(1,2) = XTX(1,2,x);
    X(2,2) = XTX(2,2,x);
    XY(1) = picture_y(1,x);
    XY(2) = picture_y(2,x);

    % Here we utilise the fact that A\B is faster than inv(A)*B
    % We also use the fact that (A*B)*C = A*(B*C) to speed things up
    B = X\XY;
    est(x) = B(1); 
end
est = reshape(est,[xdim,ydim]); 
toc % end timing

Before: Elapsed time is 4.56 seconds.
After: Elapsed time is 2.24 seconds.

これは、約 2 倍の速度向上です。このコードは、X が必要な任意の次元に拡張できる必要があります。たとえば、X = [1 xx^2] の場合picture_y、次のように変更します。

picture_y = [sum(Y);sum(Y.*picture);sum(Y.*picture.^2)];

XTXそしてに変更

XTX = cat(2,ones(n_timepoints,1,size(picture,3)),picture,picture.^2);

また、コード内の多くの 2 を 3 に変更XY(3) = picture_y(3,x)し、ループに追加します。それはかなり簡単なはずです、と私は信じています。

score 3 · Accepted Answer

私はアイデアをいじってみましたが、それは私の他のアイデアとはまったく異なるアプローチであるため、別の答えとして固執することにしました。これがこれまでのところ最速のアプローチだと思います：

元の(最適化されていない): 13.507176 秒。
高速コレスキー分解法: 0.424464 秒

まず、X'*X乗算をすばやく実行する関数があります。結果は常に対称になるため、ここで処理を高速化できます。

function XX = sym_mult(X)

n = size(X,2);
XX = zeros(n,n,size(X,3));
for i=1:n
    for j=i:n
        XX(i,j,:) = sum(X(:,i,:).*X(:,j,:));
        if i~=j
            XX(j,i,:) = XX(i,j,:);
        end
    end
end

3D行列のLDLコレスキー分解を行う関数があり（(X'*X)行列は常に対称であるため、これを行うことができます）、前方および後方置換を行ってLDL反転方程式を解きます

function Y = fast_chol(X,XY)

n=size(X,2);
L = zeros(n,n,size(X,3));
D = zeros(n,n,size(X,3));
B = zeros(n,1,size(X,3));
Y = zeros(n,1,size(X,3));
% These loops compute the LDL decomposition of the 3D matrix
for i=1:n
    D(i,i,:) = X(i,i,:);
    L(i,i,:) = 1;
    for j=1:i-1
        L(i,j,:) = X(i,j,:);
        for k=1:(j-1)
            L(i,j,:) = L(i,j,:) - L(i,k,:).*L(j,k,:).*D(k,k,:);
        end
        D(i,j,:) = L(i,j,:);
        L(i,j,:) = L(i,j,:)./D(j,j,:);
        if i~=j
            D(i,i,:) = D(i,i,:) - L(i,j,:).^2.*D(j,j,:);
        end
    end
end

for i=1:n
    B(i,1,:) = XY(i,:);
    for j=1:(i-1)
        B(i,1,:) = B(i,1,:)-D(i,j,:).*B(j,1,:);
    end
    B(i,1,:) = B(i,1,:)./D(i,i,:);
end

for i=n:-1:1
    Y(i,1,:) = B(i,1,:);
    for j=n:-1:(i+1)
        Y(i,1,:) = Y(i,1,:)-L(j,i,:).*Y(j,1,:);
    end
end

最後に、これらすべてを呼び出すメインスクリプトがあります。

xdim = 500; 
ydim = 500; 
n_timepoints = 10; % for example

Y = randn(10,xdim*ydim);
picture = randn(xdim,ydim,n_timepoints); % 500x500x10

tic  % start timing

picture = reshape(pr, [xdim*ydim,n_timepoints])';
% Here we precompute the (XT*Y) calculation
picture_y = [sum(Y);sum(Y.*picture)];

% initialize
est2 = zeros(size(picture,2),1); 

picture = permute(picture,[1,3,2]);
% Now we calculate the X'*X matrix
XTX = cat(2,ones(n_timepoints,1,size(picture,3)),picture);
XTX = sym_mult(XTX);

% Call our fast Cholesky decomposition routine
B = fast_chol(XTX,picture_y);
est2 = B(1,:);

est2 = reshape(est2,[xdim,ydim]); 
toc

繰り返しますが、これは Nx3 X 行列に対しても同様にうまく機能するはずです。

score 0 · Accepted Answer

私はオクターブを使用しているため、Matlab での結果のパフォーマンスについては何も言えませんが、このコードはわずかに高速になると予想されます。

pictureT=picture'
est=arrayfun(@(x)( (pictureT(x,:)*picture(:,x))^-1*pictureT(x,:)*randn(n_ti
mepoints,1)),1:size(picture,2));

matlab - MATLAB での反復推定 (現在はループ) の最適化

4 に答える 4

Related

Reference