r - 可変長コア名の識別

Question

次の行命名スキームのデータセットがあります。

a.X.V
where:
a is a fixed-length core ID
X is a variable-length string that subsets a, which means I should keep X
V is a variable-length ID which specifies the individual elements of a.X to be averaged
. is one of {-,_}

私がやろうとしているのは、すべての列の平均を取ることですa.X's。サンプル：

sampleList <- list("a.12.1"=c(1,2,3,4,5), "b.1.23"=c(3,4,1,4,5), "a.12.21"=c(5,7,2,8,9), "b.1.555"=c(6,8,9,0,6))
sampleList
$a.12.1
[1] 1 2 3 4 5

$b.1.23
[1] 3 4 1 4 5

$a.12.21
[1] 5 7 2 8 9

$b.1.555
[1] 6 8 9 0 6

現在、私は.Vs一般的なリストを取得するために手動でgsubbingしています：

sampleList <- t(as.data.frame(sampleList))
y <- rowNames(sampleList)
y <- gsub("(\\w\\.\\d+)\\.d+", "\\1", y)

これを行うためのより速い方法はありますか？

これは、ワークフローで発生した2つの問題の半分です。残りの半分はここで答えられました。

score 2 · Accepted Answer

おそらく、いくつかの標準ツールを適用しやすくするために、データ構造をいじることを検討できます。

sampleList <- list("a.12.1"=c(1,2,3,4,5), 
  "b.1.23"=c(3,4,1,4,5), "a.12.21"=c(5,7,2,8,9), 
   "b.1.555"=c(6,8,9,0,6))
library(reshape2)
m1 <- melt(do.call(cbind,sampleList))
m2 <- cbind(m1,colsplit(m1$Var2,"\\.",c("coreID","val1","val2")))

結果は次のようになります。

head(m2)
  Var1    Var2 value coreID val1 val2
1     1  a.12.1     1      a   12    1
2     2  a.12.1     2      a   12    1
3     3  a.12.1     3      a   12    1

次に、次のようなことをより簡単に行うことができます。

aggregate(value~val1,mean,data=subset(m2,coreID=="a"))

score 2 · Accepted Answer

パターンのベクトルを使用して、グループ化する列の位置を見つけることができます。ソリューションがその状況に対して堅牢であることを示すために、何にも一致しないことがわかっているパターンを含めました。

# A *named* vector of patterns you want to group by
patterns <- c(a.12="^a.12",b.12="^b.12",c.12="^c.12")
# Find the locations of those patterns in your list
inds <- lapply(patterns, grep, x=names(sampleList))
# Calculate the mean of each list element that matches the pattern
out <- lapply(inds, function(i) 
  if(l <- length(i)) Reduce("+",sampleList[i])/l else NULL)
# Set the names of the output
names(out) <- names(patterns)

score 1 · Accepted Answer

Rdata.frames の代わりに s に移動するだけであれば、このようなことを行う準備ができていますlist。「a」、「X」、および「V」を独自の列にします。次に、、、、、aveなどbyを使用できます。aggregatesubset

data.frame(do.call(rbind, sampleList), 
           do.call(rbind, strsplit(names(sampleList), '\\.')))

#         X1 X2 X3 X4 X5 X1.1 X2.1 X3.1
# a.12.1   1  2  3  4  5    a   12    1
# b.1.23   3  4  1  4  5    b    1   23
# a.12.21  5  7  2  8  9    a   12   21
# b.1.555  6  8  9  0  6    b    1  555

r - 可変長コア名の識別

3 に答える 3

Related

Reference