r - ggplot でいくつかの大きなデータファイルからデータをプロットする

Question

約 150000 行と 25 列のデータファイル (数値) がいくつかあります。以前は gnuplot (スクリプト行は比例プロットオブジェクト) を使用してデータをプロットしていましたが、追加の分析を行う必要があるため、Rと ggplot2 に移行しました。

データを整理する方法、考えましたか？データがどのファイルからのものであるかをマークする追加の列を持つ1つの大きなdata.frameは、本当に唯一のオプションですか? または、それを回避する方法はありますか？

編集：もう少し正確に言うと、現在データを持っている形式の例を示します。

filelst=c("filea.dat", "fileb.dat", "filec.dat")
dat=c()
for(i in 1:length(filelst)) {
    dat[[i]]=read.table(file[i])
}

score 2 · Accepted Answer

「.dat」で終わるファイル名があると仮定すると、Chase によって提案された戦略のモックアップ例を次に示します。

require(plyr)

# list the files
lf = list.files(pattern = "\.dat")
str(lf)

# 1. read the files into a data.frame
d = ldply(lf, read.table, header = TRUE, skip = 1) # or whatever options to read
str(d) # should contain all the data, and and ID column called L1

# use the data, e.g. plot
pdf("all.pdf")
d_ply(d, "L1", plot, t="l")
dev.off()
# or using ggplot2
ggplot(d, aes(x, y, colour=L1)) + geom_line()

# 2. read the files into a list

ld = lapply(lf, read.table, header = TRUE, skip = 1) # or whatever options to read
names(ld) = gsub("\.dat", "", lf) # strip the file extension
str(ld) 

# use the data, e.g. plot
pdf("all2.pdf")
lapply(names(l), function(ii) plot(l[[ii]], main=ii), t="l")
dev.off()

# 3. is not fun

score 1 · Accepted Answer

あなたの質問は少し漠然としています。私がきちんと従ったなら、あなたには3つの主な選択肢があると思います：

あなたが提案したとおりにしてから、Rに存在する「分割-適用-結合」関数のいずれかを使用して、グループごとに分析を行います。これらの機能にはby、aggregate、ave、package(plyr)、package(data.table)およびその他の多くの機能が含まれる場合があります。
データオブジェクトを個別の要素としてに保存しますlist()。次にlapply()、友達を使ってそれらに取り組みます。
すべてを異なるデータオブジェクトに分けて保持し、個別に作業します。メモリの制約などがない限り、これはおそらく最も非効率的な方法です。

r - ggplot でいくつかの大きなデータ ファイルからデータをプロットする

2 に答える 2

Related

Reference

r - ggplot でいくつかの大きなデータファイルからデータをプロットする