r - data.tableのforループの最適化

Question

ここにあるdata.tableソリューションを使用しています：隣接する列の値を平均化しながら、重複エントリのプーリング

dt.out <- dt[, lapply(.SD, function(x) paste(x, collapse=",")), 
          by=c("ID2", "chrom", "strand", "txStart", "txEnd")]

dt.out <- dt.out[ ,list(ID=paste(ID, collapse=","), ID2=paste(ID2, collapse=","), 
                       txStart=min(txStart), txEnd=max(txEnd)), 
                       by=c("probe", "chrom", "strand", "newCol")]

データセット：

ID      ID2         probe       chrom   strand txStart  txEnd  newCol
Rest_3  uc001aah.4  8044649     chr1    0      14361    29370  1.02
Rest_4  uc001aah.4  7911309     chr1    0      14361    29370  1.30  
Rest_5  uc001aah.4  8171066     chr1    0      14361    29370  2.80         
Rest_6  uc001aah.4  8159790     chr1    0      14361    29370  4.12 

Rest_17 uc001abw.1  7896761     chr1    0      861120   879961 1.11
Rest_18 uc001abx.1  7896761     chr1    0      871151   879961 3.12

このforループを追加newColして、（最初の）単一のセルにある折りたたまれた値を平均化するようにしましたdt.out。ただし、このループを実行するには時間がかかります。これを行うより速い方法はありますか？

for(i in 1:NROW(dt.out)){
  con <- textConnection(dt.out[i,grep("newCol", colnames(dt.out))])
  data <- read.csv(con, sep=",", header=FALSE)
  close(con)
  dt.out[i,grep("newCol", colnames(dt.out))]<- as.numeric(rowMeans(data)) 

}

score 2 · Accepted Answer

newCol他の質問のデータと比較して、追加の列のようです。最初のを取得した後dt.out、の折りたたまれた値の平均を取りたいと思いますnewColか?

newColに直接置き換えることでそれを行うことができますsapply(strsplit(.))。基本的に、最初のものを取得した後、これをdt.out行います：

dt.out[ , newCol := sapply(strsplit(newCol, ","), function(x) mean(as.numeric(x)))]

r - data.tableのforループの最適化

1 に答える 1

Related

Reference