r - n個の結果としてグループ化された変数を選択し、rで関数を適用します

Question

データの例を次に示します。

 myd <- data.frame (matrix (sample (c("AB", "BB", "AA"), 100*100, 
 replace = T), ncol = 100))
 variablenames= paste (rep (paste ("MR.", 1:10,sep = ""), 
  each = 10), 1:100, sep = ".")
  names(myd) <- variablenames

各変数にはグループがあり、ここでは10個のグループがあります。したがって、このデータフレームの各変数のグループインデックスは次のようになります。

group <- rep(1:10, each = 10)

したがって、変数名とグループ

 data.frame (group, variablenames)
    group variablenames
1       1        MR.1.1
2       1        MR.1.2
3       1        MR.1.3
4       1        MR.1.4
5       1        MR.1.5
6       1        MR.1.6
7       1        MR.1.7
8       1        MR.1.8
9       1        MR.1.9
10      1       MR.1.10
11      2       MR.2.11
 <<<<<<<<<<<<<<<<<<<<<<<<
100    10     MR.10.100

各グループは、次の手順が変数のグループに個別に適用されることを意味します。

私はより長い機能を持っています以下は短い例です：

一度に2つの変数を考慮した関数

myfun <- function (x1, x2) {
out <- NULL
out <-  paste(x1, x2, sep=":")
# for other steps to be performed here
return (out)
}
# group 1
myfun (myd[,1], myd[,2]); myfun (myd[,3], myd[,4]); myfun (myd[,5], myd[,6]); 
myfun (myd[,7], myd[,8]); myfun (myd[,9], myd[,10]);
# group 2 
 myfun (myd[,11], myd[,12]); myfun (myd[,13], myd[,14]); .......so on to group 10 ;

このようにして、変数1:10（つまり、上記のアクションを実行する最初のグループ）、次に11:20（2番目のグループ）を歩く必要があります。この場合、グループは重要ではありません。各グループの変数の数は、一度に取得（考慮）される変数の数（10）で割り切れます（2）。

ただし、一度に3つの変数を取得する次の例では、各グループの合計変数の数（3）、10/3で、最後に1つの変数が残っています。

一度に3つの変数を考慮した関数。

myfun <- function (x1, x2, x3) {
out <- NULL
out <-  paste(x1, x2, x3, sep=":")
# for other steps to be performed here
return (out)
}
# for group 1
myfun (myd[,1], myd[,2], myd[,3])
myfun (myd[,4], myd[,5], myd[,6])
myfun (myd[,7], myd[,8], myd[,9])  
 # As there one variable left before proceedomg to second group, the final group will 
have 1 extra variable  
myfun (myd[,7], myd[,8], myd[,9],myd[,10] )
 # for group 2   
  myfun (myd[,11], myd[,12], myd[,13])
  # and to the end all groups and to end of the file.

このプロセスを、ユーザーが定義したn個の変数を一度に考慮してループさせたいと思います。ここで、nは1から各グループの変数の最大数までです。

編集：プロセスを示すための単なる図解（たとえば、グループ1と2だけがデモされています）：

ここに画像の説明を入力してください

score 4 · Accepted Answer

データを適切なリストに分割する関数を作成し、必要な関数をリストに適用します。

この関数は、2番目のグループ化変数を作成します。（最初のグループ化変数（group）は質問で提供されます。その値を変更する場合はDIM、以下の関数も変更する必要があります。）

myfun = function(LENGTH, DIM = 10) {
  PATTERN = rep(1:(DIM %/% LENGTH), each=LENGTH)
  c(PATTERN, rep(max(PATTERN), DIM %% LENGTH))
}

分割するグループは次のとおりmydです。この例では、myd最初に10列のグループに分割し、各グループを3列のグループに分割します。ただし、最後のグループは4列（3 + 3 + 4 = 10）になります。

注：グループ化する列の数を変更するには、たとえば、一度に2つの変数でグループ化するには、に変更 します。group2 = rep(myfun(3), length.out=100) group2 = rep(myfun(2), length.out=100)

group <- rep(1:10, each = 10)
# CHANGE THE FOLLOWING LINE ACCORDING
# TO THE NUMBER OF GROUPS THAT YOU WANT
group2 = rep(myfun(3), length.out=100)

これが分割プロセスです。まず、名前だけで分割し、それらの名前をと照合mydして、のリストを作成しますdata.frames。

# Extract group names for matching purposes
temp = split(names(myd), list(group, group2))

# Match the names to myd
temp = lapply(1:length(temp),
              function(x) myd[, which(names(myd) %in% temp[[x]])])

# Extract the names from the list for future reference
NAMES = lapply(temp, function(x) paste(names(x), collapse="_"))

リストができたので、たくさんの楽しいことができます。列をコロンで区切って貼り付けたいと考えました。これがあなたがそれをする方法です。

# Do what you want with the list
# For example, to paste the columns together:
FINAL = lapply(temp, function(x) apply(x, 1, paste, collapse=":"))
names(FINAL) = NAMES

出力のサンプルを次に示します。

lapply(FINAL, function(x) head(x, 5))
# $MR.1.1_MR.1.2_MR.1.3
# [1] "AA:AB:AB" "AB:BB:AA" "BB:AB:AA" "BB:AA:AB" "AA:AA:AA"
# 
# $MR.2.11_MR.2.12_MR.2.13
# [1] "BB:AA:AB" "BB:AB:BB" "BB:AA:AA" "AB:BB:AA" "BB:BB:AA"
# 
# $MR.3.21_MR.3.22_MR.3.23
# [1] "AA:AB:BB" "BB:AA:AA" "AA:AB:BB" "AB:AA:AA" "AB:BB:BB"
# 
# <<<<<<<------SNIP------>>>>>>>>
#
# $MR.1.4_MR.1.5_MR.1.6
# [1] "AB:BB:AA" "BB:BB:BB" "AA:AA:AA" "BB:BB:AB" "AB:AA:AA"
# 
# $MR.2.14_MR.2.15_MR.2.16
# [1] "AA:BB:AB" "BB:BB:BB" "BB:BB:AB" "AA:BB:AB" "BB:BB:BB"
# 
# $MR.3.24_MR.3.25_MR.3.26
# [1] "AA:AB:BB" "BB:AA:BB" "BB:AB:BB" "AA:AB:AA" "AB:AA:AA"
# 
# <<<<<<<------SNIP------>>>>>>>>
#
# $MR.1.7_MR.1.8_MR.1.9_MR.1.10
# [1] "AB:AB:AA:AB" "AB:AA:BB:AA" "BB:BB:AA:AA" "AB:BB:AB:AA" "AB:BB:AB:BB"
# 
# $MR.2.17_MR.2.18_MR.2.19_MR.2.20
# [1] "AB:AB:BB:BB" "AB:AB:BB:BB" "AB:AA:BB:BB" "AA:AA:AB:AA" "AB:AB:AB:AB"
# 
# $MR.3.27_MR.3.28_MR.3.29_MR.3.30
# [1] "BB:BB:AB:BB" "BB:BB:AA:AA" "AA:BB:AB:AA" "AA:BB:AB:AA" "AA:AB:AA:BB"
# 
# $MR.4.37_MR.4.38_MR.4.39_MR.4.40
# [1] "BB:BB:AB:AA" "AA:BB:AA:BB" "AA:AA:AA:AB" "AB:AA:BB:AB" "BB:BB:BB:BB"
# 
# $MR.5.47_MR.5.48_MR.5.49_MR.5.50
# [1] "AB:AA:AA:AB" "AB:AA:BB:AA" "AB:BB:AA:AA" "AB:BB:BB:BB" "BB:AA:AB:AA"
# 
# $MR.6.57_MR.6.58_MR.6.59_MR.6.60
# [1] "BB:BB:AB:AA" "BB:AB:BB:AA" "AA:AB:AB:BB" "BB:AB:AA:AB" "AB:AA:AB:BB"
# 
# $MR.7.67_MR.7.68_MR.7.69_MR.7.70
# [1] "BB:AB:BB:AA" "BB:AB:BB:AA" "BB:AB:BB:AB" "AB:AA:AA:AA" "AA:AA:AA:AB"
# 
# $MR.8.77_MR.8.78_MR.8.79_MR.8.80
# [1] "AA:AB:AA:AB" "AB:AA:AB:BB" "BB:BB:AA:AB" "AB:BB:BB:BB" "AB:AA:BB:AB"
# 
# $MR.9.87_MR.9.88_MR.9.89_MR.9.90
# [1] "AA:BB:AB:AA" "AA:AB:BB:BB" "AA:BB:AA:BB" "AB:AB:AA:BB" "AB:AA:AB:BB"
# 
# $MR.10.97_MR.10.98_MR.10.99_MR.10.100
# [1] "AB:AA:BB:AB" "AB:AA:AB:BB" "BB:AB:AA:AA" "BB:BB:AA:AA" "AB:AB:BB:AB"

score 0 · Accepted Answer

myfunを再コーディングして行列を取得し、plotrixパッケージのpasteColsを使用することをお勧めします。

library(plotrix)

myfun = function(x){
    out = pasteCols(t(x), sep = ":")
    # some code
    return(out)
}

次に、非常に簡単です。各グループについて、モジュラスと整数除算を使用して、myfunを呼び出すときに使用する最初と最後の列のインデックスを計算します。

rubiques_solution = function(group, myd, num_to_group){
   # loop over groups
   for(g in unique(group)){
      var_index = which(group == g)
      num_var = length(var_index)

      # test to make sure num_to_group is smaller than the number of variable
      if(num_var < num_to_group){
         stop("num_to_group > number of variable in at least one group")
         }

      # number of calls to myfun
      num_calls = num_var %/% num_to_group

      # the idea here is that we create the first and last column
      # in which we are interested for each call
      first = seq(from = var_index[1], by = num_to_group, length = num_calls)
      last = first + num_to_group -1
      # the last call will contain possibly more varialbe, we adjust here:
      last[length(last)] = last[length(last)] + (num_var %% num_to_group)

      for(i in num_calls){
         # maybe do something with the return value of myfun ?
         myfun(myd[,first[i]:last[i]])
      }
   }  
}  

group = rep(1:10, each = 10) # same than yours
myd = data.frame (matrix (sample (c("AB", "BB", "AA"), 100*100, replace = T), ncol = 100)) # same than yours
num_to_group = 2 # this is your first example
rubiques_solution(group, myd, num_to_group)

私が問題を正しく理解したことを願っています。

r - n個の結果としてグループ化された変数を選択し、rで関数を適用します

2 に答える 2

Related

Reference