3

I am trying to calculate the number of unique player in an experiment where each player is allowed to re-enter the game. Here is what the data look like

x <- read.table(header=T, text="group timepast Name NoOfUniquePlayer
1 0.02703 A 1
1 0.02827 B 2
1 0.02874 A 2
1 0.02875 A 2
1 0.02875 D 3
2 0.03255 M 1
2 0.03417 K 2
2 0.10029 T 3
2 0.10394 T 3
2 0.10605 K 3
2 0.16522 T 3
3 0.11938 E 1
3 0.12607 F 2
3 0.13858 E 2
3 0.16084 G 3
3 0.19830 G 3
3 0.24563 V 4")

The original experiment data contain the first 3 columns, the first one is the group number of each experiment (3 groups here), the second column is the normalized time each player joined the experiment (I've sort this column from smallest to largest), the third one is the name of each player (each player only join one single group).

What I want to generate is the last column called # of unique players, e.g. for group 1, five players (A B A A D) are recorded but only 3 unique players there (A B D), player A started the game (1st row) and re-joined (3rd row) after player B played (2nd row), and then player A joined the game again (the 4th row thereby was recorded), finally player D entered and finished the whole game.

Can anyone help me figure out how to program in R to get this problem solved?

4

2 に答える 2

4

I think this will give you what you want (I think there is an error in your example for group 2)

x$uniquenum <- unlist(
  tapply(
     x$Name,
     x$group,
     function(y) 
       cummax(as.numeric(factor(y,levels=y[!duplicated(y)])))
    )
)

   group timepast Name NoOfUniquePlayer uniquenum
1      1  0.02703    A                1         1
2      1  0.02827    B                2         2
3      1  0.02874    A                2         2
4      1  0.02875    A                2         2
5      1  0.02875    D                3         3
6      2  0.03255    M                1         1
7      2  0.03417    K                2         2
8      2  0.10029    T                3         3
9      2  0.10394    T                3         3
10     2  0.10605    K                4         3
11     2  0.16522    T                4         3
12     3  0.11938    E                1         1
13     3  0.12607    F                2         2
14     3  0.13858    E                2         2
15     3  0.16084    G                3         3
16     3  0.19830    G                3         3
17     3  0.24563    V                4         4
于 2013-02-21T05:48:59.690 に答える
3

slightly more compactly, using data.table

DT <- data.table(x)


DT[, uniqueNum := cummax(match(Name,unique(Name))), by = group]

if you want the total number of unique players then

DT[, totalUnique := max(uniqueNum), by = group] 
于 2013-02-21T06:01:26.953 に答える