optimization - To memoize or not to memoize

Question

... that is the question. I have been working on an algorithm which takes an array of vectors as input, and part of the algorithm repeatedly picks pairs of vectors and evaluates a function of these two vectors, which doesn't change over time. Looking at ways to optimize the algorithm, I thought this would be a good case for memoization: instead of recomputing the same function value over and over again, cache it lazily and hit the cache.

Before jumping to code, here is the gist of my question: the benefits I get from memoization depend on the number of vectors, which I think is inversely related to number of repeated calls, and in some circumstances memoization completely degrades performance. So is my situation inadequate for memoization? Am I doing something wrong, and are there smarter ways to optimize for my situation?

Here is a simplified test script, which is fairly close to the real thing:

open System
open System.Diagnostics
open System.Collections.Generic

let size = 10 // observations
let dim = 10 // features per observation
let runs = 10000000 // number of function calls

let rng = new Random()
let clock = new Stopwatch()

let data =
    [| for i in 1 .. size ->
        [ for j in 1 .. dim -> rng.NextDouble() ] |]    
let testPairs = [| for i in 1 .. runs -> rng.Next(size), rng.Next(size) |]

let f v1 v2 = List.fold2 (fun acc x y -> acc + (x-y) * (x-y)) 0.0 v1 v2

printfn "Raw"
clock.Restart()
testPairs |> Array.averageBy (fun (i, j) -> f data.[i] data.[j]) |> printfn "Check: %f"
printfn "Raw: %i" clock.ElapsedMilliseconds

I create a list of random vectors (data), a random collection of indexes (testPairs), and run f on each of the pairs.

Here is the memoized version:

let memoized =
    let cache = new Dictionary<(int*int),float>(HashIdentity.Structural)
    fun key ->
        match cache.TryGetValue(key) with
        | true, v  -> v
        | false, _ ->
            let v = f data.[fst key] data.[snd key]
            cache.Add(key, v)
            v

printfn "Memoized"
clock.Restart()
testPairs |> Array.averageBy (fun (i, j) -> memoized (i, j)) |> printfn "Check: %f"
printfn "Memoized: %i" clock.ElapsedMilliseconds

Here is what I am observing: * when size is small (10), memoization goes about twice as fast as the raw version, * when size is large (1000), memoization take 15x more time than raw version, * when f is costly, memoization improves things

My interpretation is that when the size is small, we have more repeat computations, and the cache pays off.

What surprised me was the huge performance hit for larger sizes, and I am not certain what is causing it. I know I could improve the dictionary access a bit, with a struct key for instance - but I didn't expect the "naive" version to behave so poorly.

So - is there something obviously wrong with what I am doing? Is memoization the wrong approach for my situation, and if yes, is there a better approach?

score 7 · Accepted Answer

メモ化は便利なテクニックだと思いますが、特効薬ではありません。これは、アルゴリズムの（理論上の）複雑さを軽減する動的計画法で非常に役立ちます。最適化として、（おそらく予想されるように）さまざまな結果が得られる可能性があります。

あなたの場合、観測値の数が少ないほど（そしてf計算コストが高くなると）、キャッシュは確かに便利です。メモ化に簡単な統計を追加できます。

let stats = ref (0, 0) // Count number of cache misses & hits
let memoized =
    let cache = new Dictionary<(int*int),float>(HashIdentity.Structural)
    fun key ->
        let (mis, hit) = !stats
        match cache.TryGetValue(key) with
        | true, v  -> stats := (mis, hit + 1); v // Increment hit count
        | false, _ ->
            stats := (mis + 1, hit); // Increment miss count
            let v = f data.[fst key] data.[snd key]
            cache.Add(key, v)
            v

小さいsize場合、取得する数値は次のようになります。(100, 999900)メモ化には大きなメリットがあります。関数fは100倍に計算され、各結果は9999倍に再利用されます。
大きい場合は、何度も呼ばれるsizeようなもの(632331, 1367669)がfあり、それぞれの結果は2回だけ再利用されます。その場合、（大きな）ハッシュテーブルでの割り当てとルックアップのオーバーヘッドははるかに大きくなります。

マイナーな最適化として、を事前に割り当ててDictionary書き込むことができますがnew Dictionary<_, _>(10000,HashIdentity.Structural)、この場合はあまり役に立ちません。

この最適化を効率的にするには、メモ化された関数についてもう少し情報を知る必要があると思います。あなたの例では、入力は非常に規則的であるため、メモ化には意味がない可能性がありますが、関数が引数の値で呼び出されることが多いことがわかっている場合は、おそらくこれらの一般的な引数に対してのみメモ化できます。

score 5 · Accepted Answer

Tomas's answer is great for when you should use memoization. Here's why memoization is going so slow in your case.

It sounds like you're testing in Debug mode. Run your test again in Release and you should get a faster result for memoization. Tuples can cause a large performance hit while in Debug mode. I added a hashed version for comparison along with some micro optimizations.

Release

Raw
Check: 1.441687
Raw: 894

Memoized
Check: 1.441687
Memoized: 733

memoizedHash
Check: 1.441687
memoizedHash: 552

memoizedHashInline
Check: 1.441687
memoizedHashInline: 493

memoizedHashInline2
Check: 1.441687
memoizedHashInline2: 385

Debug

Raw
Check: 1.409310
Raw: 797

Memoized
Check: 1.409310
Memoized: 5190

memoizedHash
Check: 1.409310
memoizedHash: 593

memoizedHashInline
Check: 1.409310
memoizedHashInline: 497

memoizedHashInline2
Check: 1.409310
memoizedHashInline2: 373

Source

open System
open System.Diagnostics
open System.Collections.Generic

let size = 10 // observations
let dim = 10 // features per observation
let runs = 10000000 // number of function calls

let rng = new Random()
let clock = new Stopwatch()

let data =
    [| for i in 1 .. size ->
        [ for j in 1 .. dim -> rng.NextDouble() ] |]    
let testPairs = [| for i in 1 .. runs -> rng.Next(size), rng.Next(size) |]

let f v1 v2 = List.fold2 (fun acc x y -> acc + (x-y) * (x-y)) 0.0 v1 v2

printfn "Raw"
clock.Restart()
testPairs |> Array.averageBy (fun (i, j) -> f data.[i] data.[j]) |> printfn "Check: %f"
printfn "Raw: %i\n" clock.ElapsedMilliseconds


let memoized =
    let cache = new Dictionary<(int*int),float>(HashIdentity.Structural)
    fun key ->
        match cache.TryGetValue(key) with
        | true, v  -> v
        | false, _ ->
            let v = f data.[fst key] data.[snd key]
            cache.Add(key, v)
            v

printfn "Memoized"
clock.Restart()
testPairs |> Array.averageBy (fun (i, j) -> memoized (i, j)) |> printfn "Check: %f"
printfn "Memoized: %i\n" clock.ElapsedMilliseconds


let memoizedHash =
    let cache = new Dictionary<int,float>(HashIdentity.Structural)
    fun key ->
        match cache.TryGetValue(key) with
        | true, v  -> v
        | false, _ ->
            let i = key / size
            let j = key % size
            let v = f data.[i] data.[j]
            cache.Add(key, v)
            v

printfn "memoizedHash"
clock.Restart()
testPairs |> Array.averageBy (fun (i, j) -> memoizedHash (i * size + j)) |> printfn "Check: %f"
printfn "memoizedHash: %i\n" clock.ElapsedMilliseconds



let memoizedHashInline =
    let cache = new Dictionary<int,float>(HashIdentity.Structural)
    fun key ->
        match cache.TryGetValue(key) with
        | true, v  -> v
        | false, _ ->
            let i = key / size
            let j = key % size
            let v = f data.[i] data.[j]
            cache.Add(key, v)
            v

printfn "memoizedHashInline"
clock.Restart()
let mutable total = 0.0
for i, j in testPairs do
    total <- total + memoizedHashInline (i * size + j)
printfn "Check: %f" (total / float testPairs.Length)
printfn "memoizedHashInline: %i\n" clock.ElapsedMilliseconds


printfn "memoizedHashInline2"
clock.Restart()
let mutable total2 = 0.0
let cache = new Dictionary<int,float>(HashIdentity.Structural)
for i, j in testPairs do
    let key = (i * size + j)
    match cache.TryGetValue(key) with
    | true, v  -> total2 <- total2 + v
    | false, _ ->
        let i = key / size
        let j = key % size
        let v = f data.[i] data.[j]
        cache.Add(key, v)
        total2 <- total2 + v

printfn "Check: %f" (total2 / float testPairs.Length)
printfn "memoizedHashInline2: %i\n" clock.ElapsedMilliseconds



Console.ReadLine() |> ignore

optimization - To memoize or not to memoize

2 に答える 2

Related

Reference