r - sapply を使用して列のコンマ区切り値を集計する

Question

dA には、この種のデータテーブルがあります

id   group    startPoints       endPoints
1    A        4, 20, 50, 63,   8, 25, 60, 78
1    A        120, 300,        231, 332
1    B        500,             550
1    B        650, 800         700, 820
1    C        830, 900, 950    850, 920, 970

私が達成しようとしているのはEndPoint - StartPoint、特定のグループの長さ ( ) の SUM/MEAN/etc を取得することですが、これを sapply で機能させることはできません

私の目標は、フォームの結果を取得することです:

Group    SUM 
A        177
B        120
C        60

私は2つのことを組み合わせようとしています

 lengths <- strsplit(as.character(table$endPoints), ",", fixed=TRUE)

と

y <- factor(table$group)
tapply(lengths, y, sum)

しかし、私は立ち往生しており、それを機能させることができません。

サンプルデータの追加

structure(list(id = c(1L, 1L, 1L, 1L, 1L), group = structure(c(1L, 
1L, 2L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
startPoints = structure(c(2L, 1L, 3L, 4L, 5L), .Label = c("120,300,", 
"4,20,50,63,", "500,", "650,800,", "830,900,950,"), class = "factor"), 
endPoints = structure(c(4L, 1L, 2L, 3L, 5L), .Label = c("231,332,", 
"550,", "700,820,", "8,25,60,78", "850,920,970,"), class = "factor")), 
.Names = c("id", "group", "startPoints", "endPoints"), class = "data.frame", 
row.names = c(NA, -5L))

score 3 · Accepted Answer

This is not at all to do with sapply as you requested, but here's one approach using concat.split.multiple from my "splitstackshape" package.

First, split the data into a semi-long format:

library(splitstackshape)
mydf2 <- concat.split.multiple(mydf, split.cols = c("startPoints", "endPoints"), 
                               seps = ",", direction = "long")

Calculate the difference between your "endPoints" and "startPoints":

mydf2$diffs <- mydf2$endPoints - mydf2$startPoints
head(mydf2)
#   id group .id time startPoints endPoints diffs
# 1  1     A   1    1           4         8     4
# 2  1     A   2    1         120       231   111
# 3  1     B   1    1         500       550    50
# 4  1     B   2    1         650       700    50
# 5  1     C   1    1         830       850    20
# 6  1     A   1    2          20        25     5

Use aggregate (or data.table, or tapply, or your favorite aggregation function) to calculate whatever you want to.

aggregate(diffs ~ group, mydf2, sum)
#   group diffs
# 1     A   177
# 2     B   120
# 3     C    60

r - sapply を使用して列のコンマ区切り値を集計する

2 に答える 2

Related

Reference