r - データフレームの指定された列に対して、すべての組み合わせが同じ頻度で発生することを確認します

Question

データフレームがあります。特定の列の値のすべての組み合わせが同じ頻度で発生するかどうかを確認するにはどうすればよいですか?

(これは、要因計画を使用した実験のデータファイルを処理するときに必要になることがあります。各列は独立変数であり、独立変数のすべての組み合わせが同じ頻度で発生することを確認する必要があります)。

score 1 · Accepted Answer

どうreplications()ですか？

tmp <- transform(ToothGrowth, dose = factor(dose))

replications( ~ supp + dose, data = tmp)
replications( ~ supp * dose, data = tmp)

> replications( ~ supp + dose, data = tmp)
supp dose 
  30   20 
> replications( ~ supp * dose, data = tmp)
     supp      dose supp:dose 
       30        20        10

そして、?replicationsバランスのテストがあります：

!is.list(replications(~ supp + dose, data = tmp))

> !is.list(replications(~ supp + dose, data = tmp))
[1] TRUE

からの出力replications()は、期待するものとはまったく異なりますが、それを使用して示されているテストでは、必要な答えが得られます。

score 0 · Accepted Answer

checkAllCombosOccurEquallyOften<- function(df,colNames,dropZeros=FALSE) {
    #in data.frame df, check whether the factors in the list colNames reflect full factorial design (all combinations of levels occur equally often)
    #
    #dropZeros is useful if one of the factors nested in the others. E.g. testing different speeds for each level of    
    # something else, then a lot of the combos will occur 0 times because that speed not exist for that level.
    #but it's dangerous to dropZeros because it won't pick up on 0's that occur for the wrong reason- not fully crossed
    #
    #Returns:
    # true/false, and prints informational message
    #
    listOfCols <- as.list( df[colNames] )
    t<- table(listOfCols)

    if (dropZeros) {  
        t<- t[t!=0]   
    }           
    colNamesStr <- paste(colNames,collapse=",")
    if ( length(unique(t)) == 1 ) { #if fully crossed, all entries in table should be identical (all combinations occur equally often)
          print(paste(colNamesStr,"fully crossed- each combination occurred",unique(t)[1],'times'))
          ans <- TRUE
      } else {
          print(paste(colNamesStr,"NOT fully crossed,",length(unique(t)),'distinct repetition numbers.'  ))
          ans <- FALSE
      } 
    return(ans)
}

データセットをロードし、上記の関数を呼び出します

library(datasets)
checkAllCombosOccurEquallyOften(ToothGrowth,c("supp","dose")) #specify dataframe and columns

出力は、完全に交差しているという答えを提供します。

[1] "supp,dose fully crossed- each combination occurred 10 times"
[1] TRUE

score 0 · Accepted Answer

同じToothGrowthデータを使用する:

library(datasets)
library(data.table)

dt = data.table(ToothGrowth)

setkey(dt, supp, dose)
dt[CJ(unique(supp), unique(dose)), .N] # note: using hidden by-without-by
#   supp dose  N
#1:   OJ  0.5 10
#2:   OJ  1.0 10
#3:   OJ  2.0 10
#4:   VC  0.5 10
#5:   VC  1.0 10
#6:   VC  2.0 10

次に、すべてNのが等しいかどうか、または他の好きなものを確認できます。

r - データフレームの指定された列に対して、すべての組み合わせが同じ頻度で発生することを確認します

3 に答える 3

Related

Reference