python - Pythonのnumpy行列から、可能な列のすべてのペアについて、ペア文字の一意の出現回数の頻度を取得する方法

Question

numpy マトリックスを使用して、次のようなマトリックスがあります。

>>> print matrix
[['L' 'G' 'T' 'G' 'A' 'P' 'V' 'I']
 ['A' 'A' 'S' 'G' 'P' 'S' 'S' 'G']
 ['A' 'A' 'S' 'G' 'P' 'S' 'S' 'G']
 ['G' 'L' 'T' 'G' 'A' 'P' 'V' 'I']]

私がしたいのは、列のすべての可能なペアに対して、列の各ペア内の行から文字のすべてのペアの一意の出現回数の頻度を取得することです。

たとえば、最初のペアの列は次のようになります。

[['L' 'G']
 ['A' 'A']
 ['A' 'A']
 ['G' 'L']]

列内のすべての文字ペアの頻度を取得したいと思います (注: 文字の順序が重要です)

['L''G'] の頻度 = 1/4

['A' 'A'] の頻度 = 2/4

['G''L'] の頻度 = 1/4

最初のペア列のこれらの頻度が計算されたら、列の組み合わせの他のすべての可能なペアに対して同じことを行います。

ある種の itertools がこの質問を解決するのに役立つと思いますが、方法がわかりません...どんな助けも大歓迎です

score 6 · Accepted Answer

私は使用itertools.combinationsしcollections.Counterます：

for i, j in itertools.combinations(range(len(s.T)), 2):
    c = s[:, [i,j]]
    counts = collections.Counter(map(tuple,c))
    print 'columns {} and {}'.format(i,j)
    for k in sorted(counts):
        print 'Frequency of {} = {}/{}'.format(k, counts[k], len(c))
    print

生産する

columns 0 and 1
Frequency of ('A', 'A') = 2/4
Frequency of ('G', 'L') = 1/4
Frequency of ('L', 'G') = 1/4

columns 0 and 2
Frequency of ('A', 'S') = 2/4
Frequency of ('G', 'T') = 1/4
Frequency of ('L', 'T') = 1/4

[...]

(両方の順序が必要な場合は、列 0 1 と 1 0 の両方を実行するように変更するのは簡単です。考えられる列のすべてのペアによって、「隣接する列のすべてのペア」を意味するわけではないと想定しています)。

score 0 · Accepted Answer

メモリに余裕がある場合、配列のサイズによっては、列が少なく、行が多いと推測されます。

>>> rows, cols = matrix.shape
>>> matches = np.empty((rows, cols, cols, 2), dtype=str)
>>> matches[..., 0] = matrix[:, None, :]
>>> matches[..., 1] = matrix[:, :, None]
>>> matches = matches.view('S2')
>>> matches = matches.reshape((rows, cols, cols))

これで、列とmatches[:, i, j]の間に一意のペアができたので、次のことができます。ij

>>> unique, idx = np.unique(matches[:, 0, 1], return_inverse=True)
>>> counts = np.bincount(idx)
>>> unique
array(['AA', 'GL', 'LG'], 
      dtype='|S2')
>>> counts
array([2, 1, 1])

python - Pythonのnumpy行列から、可能な列のすべてのペアについて、ペア文字の一意の出現回数の頻度を取得する方法

2 に答える 2

Related

Reference