2

I really wish the documentation was more clear on the use of accumarray for N-D size matrices in Matlab. Regardless, I'm totally confused here and looking for some advice. Here's the challenge:

I have a 3D matrix of data.

  • ROWS are individual rivers
  • COLUMNS are dates of observations
  • PAGES are the time intervals of data collection

For this example, let's assume each observation is volume of water flowing through the meter in a 5 minute interval. However, I now wish to resample the data to intervals of N minutes (obviously multiples of 5). Let's choose NMINS = 15.

So, what I want to do is find the sum or mean of the volume of water over NMINS minute intervals. That is, the ROWS and COLUMNS will not change, but the dimensions and values for the PAGES will be decimated/aggregated.

I can get the grouping values for the pages. That is, the values I need to group by. If it was a single river for a single day, no problem. But, I have hundreds of days and dozens of rivers.

I've gotten this far:

CREATE TEST TIME VECTOR

NMINS   = 15; % Bucket by every 15 mins or 20, etc.
N5MINS  = 5 * 12 * 24 * 2; % Keep small - Two days of 5 min datenums
dnums   = datenum(2016,3,20,0,0:5:N5MINS,0);
% Trim dnums to start at random time for edge case and keep only mins
mins    = rem(dnums(25:end-30),1)';    % Create column vector

CREATE RANDOM MATRIX FOR TESTING

rng(123); % Set seed for reproducibility
X       = randi(100,12,9,length(mins)); % Test matrix

FIND TIMES IN TERMS OF MINUTES

[~,~,~,H,M] = datevec( mins );
H       = 60 .* (H - H(1));

NOW FIND ALL TIMES THAT CORRESPOND TO OUR BUCKET

idxMIN  = mod( M+H, NMINS )==0;
idxNewP = idxMIN;          % This is used to align the new river matrix
[R,C,P] = size(X);         % We'll drop P and use newP
newP    = sum(idxNewP); % Number of PAGES in final matrix (new)
% Final newX will have dimensions [R,C,newP]

RESET THE GROUPING INDICES

% Must shift one all as minute intervals represent data UP to that point
% for the actual grouping of data. Test if first bucket is a 
% match and set accordingly
if idxMIN(1)
    idxMIN = [1; idxMIN(1:end-1)]; 
    subs = cumsum(idxMIN);
else 
    idxMIN = [0; idxMIN(1:end-1)];
    subs = cumsum(idxMIN) + 1 ;
end

ADDITION: The group size will not be consistent and I cannot make that assumption (sadly). Consider the following after running the above.

tsttbl = table();
tsttbl.dnumstr = datestr(mins(1:5));
tsttbl.Mins    = M(1:5);
tsttbl.subs    = subs(1:5);
tsttbl

Output shows first group is a size of 1:

tsttbl = 

    dnumstr     Mins    subs
    ________    ____    ____

     2:00 AM     0      1   
     2:05 AM     5      2   
     2:10 AM    10      2   
     2:15 AM    15      2   
     2:20 AM    20      3  

Last, ultimately, I'll need to pass custom functions. The above is a toy example to illustrate the problem quickly. My apologies for not being more clear.

END ADDITION

And this is where I stumble...

How do I set the subs values to apply along each page to use accumarray? I'm totally confused by the documentation. Or is this actually the wrong approach? For what it's worth I'm using Matlab 2015b.

Honestly, any help would be greatly appreciated.

ALTERNATE SOLUTION This hit me on the way home. Meshgrid is my friend...

Once the cells above have been run (note I changed the size of the matrix X), we create the indices for the entire matrix where the "indices" for the pages (i.e., times) are given by the values in "subs". To do this, I use meshgrid.

[days,rivers,pages] = meshgrid(1:C,1:R,subs);
grpvals = [rivers(:) days(:) pages(:)];
tst     = accumarray(grpvals,X(:),[R C newP],@sum);

Probably not the most memory efficient as I have to create essentially the days, rivers, and pages matrices, then wind up creating a new grpvals array of those. But, it has the advantage that now I can use accumarray and pass anonymous functions, @std, etc.

Hope this helps others!

Huge thanks to Luis.

4

1 に答える 1

4

すべてのグループが同じサイズの場合

集計は次のように実行できます。

  1. reshape4 番目の次元に沿って、集計するグループを構築します。3 次元は各グループの要素を参照し、4 次元はグループを参照します。
  2. sum3 次元 (各グループ) に沿って。
  3. squeeze現在シングルトンの 3 次元を取り出して、3D 配列を復元します。

コード:

X = randi(9,2,3,6); %// example data. 3D array.
G = 2; %// group size along 3rd dim. Divides size(X,3)
result = squeeze(sum(reshape(X, size(X,1), size(X,2), G, []), 3));

たとえばG = 2

X(:,:,1) =
     2     3     9
     4     5     9
X(:,:,2) =
     3     8     2
     6     9     8
X(:,:,3) =
     4     4     4
     1     1     7
X(:,:,4) =
     9     9     8
     2     4     1
X(:,:,5) =
     9     5     9
     3     5     8
X(:,:,6) =
     9     1     3
     5     3     1

与える

result(:,:,1) =
     5    11    11
    10    14    17
result(:,:,2) =
    13    13    12
     3     5     8
result(:,:,3) =
    18     6    12
     8     8     9

一般的なケース: 異なるサイズのグループ

accumarray2 番目の入力として多次元配列 (または行列) では機能しないため、この回答の行に沿って行列乗算を使用できます。そのためには、3D 配列の最初の 2 つの次元を 1 つの次元 (最後にアンパックされます) にパックする必要があり、グループ インデックスから、行列の乗算によって目的の結果が得られる 0-1 の行列を構築します。

X = randi(9,2,3,5); %// example data. 3D array.
subs = [1 2 2 1 1]; %// indices of groups. Groups may differ in size, and indices
                    %// need not be sorted
Y = reshape(X, [], size(X,3)); %// reshape into a matrix. Groups are along rows
M = full(sparse(1:numel(subs), subs, 1)); %// indicator matrix from group indices
result = reshape(Y*M, size(X,1), size(X,2), []); %// compute result and reshape 

例えば、

X(:,:,1) =
     9     3     8
     6     8     8
X(:,:,2) =
     3     8     3
     7     2     2
X(:,:,3) =
     7     3     6
     2     8     5
X(:,:,4) =
     7     4     5
     8     8     6
X(:,:,5) =
     2     3     2
     2     8     8

subs =
     1     2     2     1     1

与える

result(:,:,1) =
    18    10    15
    16    24    22
result(:,:,2) =
    10    11     9
     9    10     7
于 2016-03-20T17:55:53.580 に答える