パターンのベクトルと潜在的な一致候補の大きなベクトルがあります。の各要素について、で近似一致のリストを取得するためx
に使用します。問題は、コードが非常に遅いことです。 からの各要素ごとに約 2 秒かかります。agrep
y
x
Is there a way to speed it up? x
in this example is only 6 elements, but in real project x
is of length 41k. y
here is approximately 103k elements, which is close to real life.
サンプルの出力を表示する必要がある場合3300
はy
、1
前もって感謝します!
x = c("procter & gamble; tide free & gentle", "procter & gamble; tide he turbo clean",
"procter & gamble; tide simply clean", "procter & gamble; tide simply clean & fresh",
"procter & gamble; tide simply clean & sensitive", "procter & gamble; tide total care")
y = rep(c("procter & gamble; tide", "procter & gamble; tide & downy",
"procter & gamble; tide actilift", "procter & gamble; tide basic",
"procter & gamble; tide boost", "procter & gamble; tide boost vivid white + bright",
"procter & gamble; tide buzz", "procter & gamble; tide cold water",
"procter & gamble; tide colorguard", "procter & gamble; tide compact",
"procter & gamble; tide febreze", "procter & gamble; tide febreze sport",
"procter & gamble; tide high efficiency", "procter & gamble; tide oxi",
"procter & gamble; tide plus", "procter & gamble; tide plus colorguard",
"procter & gamble; tide pods", "procter & gamble; tide pods plus febreze",
"procter & gamble; tide pure essentials", "procter & gamble; tide simple pleasures",
"procter & gamble; tide simply clean & fresh", "procter & gamble; tide simply clean & sensitive",
"procter & gamble; tide stain release", "procter & gamble; tide stain release free",
"procter & gamble; tide to go", "procter & gamble; tide total clean",
"procter & gamble; tide totalcare", "procter & gamble; tide ultra 2",
"procter & gamble; tide vivid white & bright", "procter & gamble; tide with dawn",
"procter & gamble; tidekick"),3300)
mapped = as.matrix("",nrow=length(x))
myMap = function() {
for (i in 1:length(x)) {
mapped[i] = paste(y[agrep(x[i],y,max.distance=2.9,fixed=T,useBytes=T)],collapse = "|")
}
return(mapped)
}
print(microbenchmark(myMap(),times=5))
タイミング
Unit: seconds
expr min lq mean median uq max neval
myMap() 11.57354 11.61535 11.6225 11.61919 11.64641 11.658 5
で 1 回だけ繰り返したサンプル出力y
:
1
2
3 procter & gamble; tide simple pleasures|procter & gamble; tide simply clean & fresh|procter & gamble; tide simply clean & sensitive
4 procter & gamble; tide simply clean & fresh
5 procter & gamble; tide simply clean & sensitive
6 procter & gamble; tide total clean|procter & gamble; tide totalcare