次のデータセットがあります
> head(data)
X UserID NPS V3 V4 V5 Event V7 Element ElementValue
1 1 254727216 10 0 19 10 nps.agent.14b.no other attempt was made 10/4/2014 23:59 cea.element_name nps.agent.14b.no other attempt was made
2 2 298379949 0 0 28 11 nps.agent.14b.no other attempt was made 9/30/2014 23:59 cea.element_name nps.agent.14b.no other attempt was made
3 3 254710917 0 0 20 12 nps.agent.14b.no other attempt was made 9/15/2014 23:59 cea.element_name nps.agent.14b.no other attempt was made
4 4 238919392 7 0 17 9 nps.agent.14b.no other attempt was made 9/17/2014 23:59 cea.element_name nps.agent.14b.no other attempt was made
5 5 144693025 10 0 18 10 nps.agent.14b.no other attempt was made 9/17/2014 23:59 cea.element_name nps.agent.14b.no other attempt was made
6 6 249978568 5 0 21 12 nps.agent.14b.no other attempt was made 9/18/2014 23:59 cea.element_name nps.agent.14b.no other attempt was made
データセットを次のように分割すると:
data_splitted <- split(data,data$UserID)
ここでの問題は、このサンプルではなくデータセット全体でこれを試すと、RAM を超えるサイズの大幅な増加です。
> format(object.size(data),units="Mb")
[1] "0.2 Mb"
> format(object.size(data_splitted),units="Mb")
[1] "45.7 Mb"
なぜこれが起こっているのか、そしてこれに取り組む方法があるかどうかについての洞察をいただければ幸いです。