r - R でのプログラムによる割り当ての効率

Question

要約すると、いくつかの txt ファイルに保存されている大量のデータをインポートするためのスクリプトがあります。単一ファイルでは、すべての行が同じテーブルに配置されるわけgetではありません (DF は現在 DT に切り替わります)。そのため、ファイルごとに、同じ DF、 DFに属するすべての行、およびその行を選択しassignます。

たとえば、table1 という名前の DF を初めて作成するときは、次のようにします。

name <- "table1" # in my code the value of name will depend on different factors
                 # and **not** known in advance
assign(name, someRows)

次に、実行中に、私のコードは (他のファイルで) table1 データフレームに配置される他の行を見つけることがあります。

name <- "table"
assign(name, rbindfill(get(name), someRows))

私の質問は次のとおりですassign(get(string), anyObject)。プログラムで割り当てを行うための最良の方法はありますか? ありがとう

編集：

これが私のコードの簡略化されたバージョンです:(各項目は1つのテキストファイルdataSourceの結果です)read.table()

set.seed(1)
#
dataSource <- list(data.frame(fileType = rep(letters[1:2], each=4),
                              id       = rep(LETTERS[1:4], each=2),
                              var1     = as.integer(rnorm(8))),
                   data.frame(fileType = rep(letters[1:2], each=4),
                              id       = rep(LETTERS[1:4], each=2),
                              var1     = as.integer(rnorm(8))))
#                   #                                                                                          #
#                          
library(plyr)
#
tablesnames <- unique(unlist(lapply(dataSource,function(x) as.character(unique(x[,1])))))
for(l in tablesnames){
  temp <- lapply(dataSource, function(x) x[x[,1]==l, -1])
  if(exists(l)) assign(l, rbind.fill(get(l), rbind.fill(temp))) else assign(l, rbind.fill(temp))
}
#
#            
# now two data frames a and b are crated
#
#
# different method using rbindlist in place of rbind.fill (faster and, until now, I don't # have missing column to fill)
#
rm(a,b)
library(data.table)
#
tablesnames <- unique(unlist(lapply(dataSource,function(x) as.character(unique(x[,1])))))
for(l in tablesnames){
  temp <- lapply(dataSource, function(x) x[x[,1]==l, -1])
  if(exists(l)) assign(l, rbindlist(list(get(l), rbindlist(temp)))) else assign(l, rbindlist(temp))
}

score 4 · Accepted Answer

named を使用し、 and の使用をスキップすることをお勧めしlistます。クールな R 機能の多く (たとえば) はリストで非常にうまく機能し、とを使用すると機能しません。さらに、リストを関数に簡単に渡すことができますが、変数のグループをおよびと組み合わせて使用すると、やや面倒になる場合があります。assigngetlapplyassigngetassignget

一連のファイルを 1 つの大きな data.frame に読み込みたい場合は、次のようなものを使用します (テキストファイルのような csv を想定):

library(plyr)
list_of_files = list.files(pattern = "*.csv")
big_dataframe = ldply(list_of_files, read.csv)

または、結果をリストに保持したい場合：

big_list = lapply(list_of_files, read.csv)

そしておそらく使用しますrbind.fill：

big_dataframe = do.call("rbind.fill", big_list)

r - R でのプログラムによる割り当ての効率

1 に答える 1

Related

Reference