1

ねえ、セル配列があります。2 番目の列は「XX->XX」の時間です。たとえば、次のようになります。

'AA->AA'    [21]    [4.2084]
'AA->AC'    [15]    [3.0060]
'AA->AG'    [ 9]    [1.8036]
'AA->AT'    [12]    [2.4048]
'AC->CA'    [14]    [2.8056]
'AC->CC'    [16]    [3.2064]
'AC->CG'    [ 5]    [1.0020]
'AC->CT'    [ 3]    [0.6012]
'AG->GA'    [11]    [2.2044]
'AG->GC'    [ 5]    [1.0020]
'AG->GG'    [ 8]    [1.6032]
'AG->GT'    [13]    [2.6052]
'AT->TA'    [10]    [2.0040]
'AT->TC'    [ 8]    [1.6032]
'AT->TG'    [ 2]    [0.4008]
'AT->TT'    [11]    [2.2044]
'CA->AA'    [17]    [3.4068]
'CA->AC'    [ 7]    [1.4028]
'CA->AG'    [ 9]    [1.8036]
'CA->AT'    [11]    [2.2044]
'CC->CA'    [15]    [3.0060]
'CC->CC'    [ 5]    [1.0020]
'CC->CG'    [ 4]    [0.8016]
'CC->CT'    [17]    [3.4068]
'CG->GA'    [ 1]    [0.2004]
'CG->GC'    [ 2]    [0.4008]
'CG->GG'    [ 9]    [1.8036]
'CG->GT'    [ 3]    [0.6012]
'CT->TA'    [ 7]    [1.4028]
'CT->TC'    [ 9]    [1.8036]
'CT->TG'    [ 9]    [1.8036]
'CT->TT'    [ 2]    [0.4008]
'GA->AA'    [10]    [2.0040]
'GA->AC'    [ 4]    [0.8016]
'GA->AG'    [10]    [2.0040]
'GA->AT'    [ 2]    [0.4008]
'GC->CA'    [ 2]    [0.4008]
'GC->CC'    [ 7]    [1.4028]
'GC->CG'    [ 6]    [1.2024]
'GC->CT'    [ 3]    [0.6012]
'GG->GA'    [ 6]    [1.2024]
'GG->GC'    [ 6]    [1.2024]
'GG->GG'    [ 4]    [0.8016]
'GG->GT'    [ 8]    [1.6032]
'GT->TA'    [ 6]    [1.2024]
'GT->TC'    [11]    [2.2044]
'GT->TG'    [ 8]    [1.6032]
'GT->TT'    [ 5]    [1.0020]
'TA->AA'    [ 8]    [1.6032]
'TA->AC'    [13]    [2.6052]
'TA->AG'    [ 9]    [1.8036]
'TA->AT'    [ 6]    [1.2024]
'TC->CA'    [13]    [2.6052]
'TC->CC'    [13]    [2.6052]
'TC->CT'    [ 4]    [0.8016]
'TG->GA'    [ 8]    [1.6032]
'TG->GC'    [ 5]    [1.0020]
'TG->GG'    [ 3]    [0.6012]
'TG->GT'    [ 6]    [1.2024]
'TT->TA'    [13]    [2.6052]
'TT->TC'    [ 2]    [0.4008]
'TT->TG'    [ 3]    [0.6012]
'TT->TT'    [ 5]    [1.0020]

今、私は確率を計算しようとしています: P('AA->AA')=TIMES('AA->AA')/SUM('AA->AA','AA->AC','AA-> AG','AA->AT')、つまり、P('AA->AA')=TIMES('AA->AA')/SUM('AA->Anyone')。他の人も同じです。それを行うためにループを使用したいのですが、極端な場合があります

'TC->CA'    [13]    [2.6052]
'TC->CC'    [13]    [2.6052]
'TC->CT'    [ 4]    [0.8016]

まあ、'TC->CG' の回数が 0 であることは明らかであり、確率が 0 であることは既にわかっているにもかかわらず、考慮する必要があります。もちろん、この極端なケースは、 「TT->TT」が欠けていることもあれば、「TC->CT」が欠けていることもあります。誰もがそれを行う方法を理解していますか? ありがとう。

4

1 に答える 1

1

これを試して -

%%// Get the cell data into data1
data1 = INPUT_DATA;

%%// Get the data from columns separately
col1 = data1(:,1);
tag_data = vertcat(col1{:});

col2 = data1(:,2);
times_data = vertcat(col2{:});

col3 = data1(:,3);
col3_data = vertcat(col3{:});

%%// Get full data for tag, times and column3
char_array = ['A' 'C' 'G' 'T'];
full_tag_data = char_array(combinator(4,3,'p','r'));
full_tag_data = [full_tag_data(:,1:2) repmat('->',[size(full_tag_data,1) 1]) full_tag_data(:,2:3)];

present_rows = ismember(full_tag_data,tag_data,'rows');
full_times_data = double(present_rows);
full_times_data(present_rows) = times_data;

full_col3_data = double(present_rows);
full_col3_data(present_rows) = col3_data;

%%// Get the sum values
full_col3_data_summed = sum(reshape(full_col3_data,4,[]),1);
full_col3_data_summed = reshape(repmat(full_col3_data_summed,[4 1]),[],1);

%%// Store the required values into a cell array out_cell1
out_cell1 = cell(size(present_rows,1),2);
out_cell1(:,1) = cellstr(full_tag_data);
out_cell1(:,2) = num2cell(full_times_data);
out_cell1(:,3) = num2cell(full_col3_data);

%%// The probabilities are added into the cell array as the fourth column
out_cell1(:,4) = num2cell(full_times_data./full_col3_data_summed);

注:上記のコードは関数を使用しており、こちらからcombinator入手できます。

于 2014-03-26T11:10:29.240 に答える