さまざまな操作を組み合わせた関数を作成します。これがそのような関数の1つであり、コメントが多いです。
process <- function(x) {
## this adds a vector with the group sum score
x <- within(x, sumScore <- ave(score, group, FUN = sum))
## drop the group with sumScore == 0
x <- x[-which(x$sumScore == 0L), , drop = FALSE]
## choose groups with sumScore > 1
## sample sumScore - 1 of the rows where score == 1L
foo <- function(x) {
scr <- unique(x$sumScore) ## sanity & take only 1 of the sumScore
## which of the grups observations have score = 1L
want <- which(x$score == 1L)
## want to sample all bar one of these
want <- sample(want, scr-1)
## remove the selected rows & retun
x[-want, , drop = FALSE]
}
## which rows are samples with group sumScore > 1
want <- which(x$sumScore > 1L)
## select only those samples, split up those samples by group, lapplying foo
## to each group, then rbind the resulting data frames together
newX <- do.call(rbind,
lapply(split(x[want, , drop = FALSE], x[want, "group"]),
FUN = foo))
## bind the sampled sumScore > 1L on to x (without sumScore > 1L)
newX <- rbind(x[-want, , drop = FALSE], newX)
## remove row labels
rownames(newX) <- NULL
## return the data without the sumScore column
newX[, 1:3]
}
あなたのデータでそれ:
dat <- data.frame(group = c(1,1,1,2,2,3,3,3,4,4,4,4,4),
member = c(1,2,3,1,2,1,2,3,1,2,3,4,5),
score = c(0,1,0,0,0,1,0,1,0,1,1,1,0))
与える:
> set.seed(42)
> process(dat)
group member score
1 1 1 0
2 1 2 1
3 1 3 0
4 3 1 1
5 3 2 0
6 4 1 0
7 4 3 1
8 4 5 0
何が欲しかったのかと思います。
更新:上記ではprocess()
、内部関数foo()
を書き直して1行のみをサンプリングし、他の行を削除することができました。foo()
つまり、以下のものに置き換えます。
foo <- function(x) {
scr <- unique(x$sumScore) ## sanity & take only 1 of the sumScore
## which of the grups observations have score = 1L
want <- which(x$score == 1L)
## want to sample just one of these
want <- sample(want, 1)
## return the selected row & retun
x[want, , drop = FALSE]
}
これらは基本的に同じ操作ですがfoo()
、1行だけを選択すると、意図した動作が明示的になります。scr-1
サンプル値ではなく、スコア==1Lの行からランダムに1行を選択します。