r - イベントリストからの発生率マトリックス

Question

次の形式のイベントの膨大なリストがあります。

> dput(head(events))
structure(list(action = c("110:0.49,258:0.49", "110:0.49,258:0.49", 
"110:0.49,258:0.49", "114:1.0,299:1.0", "114:1.0,299:1.0", "110:0.49"
), response = c("113=5-110=266-111=30-258=248-99=18-264=15", "113=5-110=278-111=30-258=260-99=18-264=15", 
"113=5-110=284-111=30-258=266-99=18-264=15", "114=34-299=34-108=134-110=12-246=67", 
"114=34-299=34-108=134-110=18-246=67", "114=34-113=6-299=34-108=146-110=24-246=73"
)), .Names = c("action", "response"), row.names = c(NA, 6L), class = "data.frame")

とは、actionとresponseのようなキーから110と114のような値へ0.49のマップ5です。

私が欲しいのは、(i,j)エントリがsum(action[i] * response[j])すべてのイベントにまたがるマトリックスです。ここでaction[i]、はキーの値ですi(同様に responseさらに、ベクトルsum(action[i])とsum(response[j]).

次のようなものを使用してそれを行うことができます：

# split actions
l <- strsplit(events$action,",")
ll <- sapply(l,length)
l <- unlist(l)
l1 <- strsplit(l,":")
rm(l)
df1 <- data.frame(response = events$response[rep(1:nrow(events), ll)],
                  action = as.factor(sapply(l1,"[[",1)),
                  action.weight = as.numeric(sapply(l1,"[[",2)))

# split responses
l <- strsplit(df1$response,"-")
ll <- sapply(l,length)
l <- unlist(l)
l1 <- strsplit(l,"=")
rm(l)
rows <- rep(1:nrow(df1), ll)
df2 <- data.frame(action = df1$action[rows],
                  action.weight = df1$action.weight[rows],
                  response = as.factor(sapply(l1,"[[",1)),
                  response.weight = as.numeric(sapply(l1,"[[",2)))
df2$weight <- df2$action.weight * df2$response.weight
df2$action.weight <- NULL
df2$response.weight <- NULL

# summarise by action/response
dt1 <- as.data.table(df2)
setkeyv(dt1,c("action","response"))
dt2 <- dt1[, sum(weight), by="action,response"]

これは多かれ少なかれ私が必要とするものであるべきだと思います。

ただし、中間オブジェクト ( df1、df2、l、 &c) は RAM に対して大きすぎます。より効率的な方法で必要なことを達成する方法があるのだろうか。

PS。action実際、とのキーのセットはresponse同一ですが、これに依存する理由はないようです。

r - イベント リストからの発生率マトリックス

0 に答える 0

Related

Reference

r - イベントリストからの発生率マトリックス