2

I have a list of key/value pairs and would like to convert it into a 2d matrix where the cells represent the counts for each key/value combination. Here is a sample data frame

doc_id,link
1,http://example.com
1,http://example.com
2,http://test1.net
2,http://test2.net
2,http://test5.net
3,http://test1.net
3,http://example.com
4,http://test5.net

At the moment, I am using R's plyr package and the following command for that kind of transformation:

link_matrix <- daply(link_list, .(doc_id, link), summarise, nrow(piece))

Here is the result matrix object:

doc_id http://example.com http://test1.net http://test2.net http://test5.net
     1 List,1             NULL             NULL             NULL            
     2 NULL               List,1           List,1           List,1          
     3 List,1             List,1           NULL             NULL            
     4 NULL               NULL             NULL             List,1 

The resulting array entries are fine - they give me the key/value counts; but what I actually need are numeric values in the result matrix. It should look like this:

doc_id http://example.com http://test1.net http://test2.net http://test5.net
     1 2                  0                0                0            
     2 0                  1                1                1          
     3 1                  1                0                0            
     4 0                  0                0                0

I could do this by iterating the matrix elements and performing the necessary conversions but I am pretty sure that there is a better solution which allows me to do that directly in the daply function. I just haven't figured out how and appreciate help on this.

4

2 に答える 2

3

これを行うには、コードを次のように簡略化します(つまり、を削除しますsummarise)。

daply(link_data, .(doc_id, link), nrow)

doc_id http://example.com http://test1.net http://test2.net http://test5.net
     1                  2               NA               NA               NA
     2                 NA                1                1                1
     3                  1                1               NA               NA
     4                 NA               NA               NA                1

次に、値を削除することが重要な場合はNA、配列サブセットを使用します。

aa <- daply(link_data, .(doc_id, link), nrow)
aa[is.na(aa)] <- 0
aa

      link
doc_id http://example.com http://test1.net http://test2.net http://test5.net
     1                  2                0                0                0
     2                  0                1                1                1
     3                  1                1                0                0
     4                  0                0                0                1
于 2011-08-10T19:20:21.533 に答える
0

castの関数を使用しますreshape

library(reshape)
cast(transform(mydf, value = 1), doc_id ~ link)
于 2011-08-10T19:04:29.903 に答える