haskell - Haskellでメモ化？

Question

大量の場合、Haskell で次の関数を効率的に解決する方法に関するポインタ(n > 108)

f(n) = max(n, f(n/2) + f(n/3) + f(n/4))

Haskell でフィボナッチ数を解くためのメモ化の例を見てきました。これには、必要な n までのすべてのフィボナッチ数を (遅延して) 計算することが含まれていました。しかし、この場合、与えられた n に対して、ごく少数の中間結果を計算するだけで済みます。

ありがとう

score 274 · Accepted Answer

サブリニア時間でインデックス付けできる構造を作成することで、これを非常に効率的に行うことができます。

でもまず、

{-# LANGUAGE BangPatterns #-}

import Data.Function (fix)

を定義しましょうfが、それ自体を直接呼び出すのではなく、「オープン再帰」を使用するようにします。

f :: (Int -> Int) -> Int -> Int
f mf 0 = 0
f mf n = max n $ mf (n `div` 2) +
                 mf (n `div` 3) +
                 mf (n `div` 4)

fを使用してメモ化されていないものを取得できますfix f

これにより、次のように呼び出して、fの小さな値に対して何を意味するかをテストできます。ffix f 123 = 144

次のように定義することで、これをメモ化できます。

f_list :: [Int]
f_list = map (f faster_f) [0..]

faster_f :: Int -> Int
faster_f n = f_list !! n

これはまずまずのパフォーマンスを発揮し、O(n^3)時間かかるものを、中間結果を記憶するものに置き換えます。

しかし、のメモ化された答えを見つけるためにインデックスを作成するだけでも、直線的な時間がかかりますmf。これは、次のような結果になることを意味します。

*Main Data.List> faster_f 123801
248604

許容範囲ですが、結果はそれ以上にはなりません。私たちはもっとうまくやることができます！

まず、無限ツリーを定義しましょう。

data Tree a = Tree (Tree a) a (Tree a)
instance Functor Tree where
    fmap f (Tree l m r) = Tree (fmap f l) (f m) (fmap f r)

nそして、それにインデックスを付ける方法を定義するので、代わりにO(log n)時間でインデックスを持つノードを見つけることができます:

index :: Tree a -> Int -> a
index (Tree _ m _) 0 = m
index (Tree l _ r) n = case (n - 1) `divMod` 2 of
    (q,0) -> index l q
    (q,1) -> index r q

...そして、自然数でいっぱいのツリーが便利であることがわかるかもしれないので、それらのインデックスをいじる必要はありません:

nats :: Tree Int
nats = go 0 1
    where
        go !n !s = Tree (go l s') n (go r s')
            where
                l = n + s
                r = l + s
                s' = s * 2

インデックスを作成できるので、ツリーをリストに変換するだけです。

toList :: Tree a -> [a]
toList as = map (index as) [0..]

を確認することで、これまでの作業を確認できtoList natsます。[0..]

今、

f_tree :: Tree Int
f_tree = fmap (f fastest_f) nats

fastest_f :: Int -> Int
fastest_f = index f_tree

上記のリストと同じように機能しますが、各ノードを見つけるのに線形時間をかける代わりに、対数時間で追跡できます。

結果はかなり高速です。

*Main> fastest_f 12380192300
67652175206

*Main> fastest_f 12793129379123
120695231674999

実際、はるかに高速であるため、上記を実行して置き換えIntて、Integer途方もなく大きな答えをほぼ瞬時に得ることができます

*Main> fastest_f' 1230891823091823018203123
93721573993600178112200489

*Main> fastest_f' 12308918230918230182031231231293810923
11097012733777002208302545289166620866358

ツリーベースのメモ化を実装するすぐに使えるライブラリの場合は、 MemoTrieを使用します。

$ stack repl --package MemoTrie

Prelude> import Data.MemoTrie
Prelude Data.MemoTrie> :set -XLambdaCase
Prelude Data.MemoTrie> :{
Prelude Data.MemoTrie| fastest_f' :: Integer -> Integer
Prelude Data.MemoTrie| fastest_f' = memo $ \case
Prelude Data.MemoTrie|   0 -> 0
Prelude Data.MemoTrie|   n -> max n (fastest_f'(n `div` 2) + fastest_f'(n `div` 3) + fastest_f'(n `div` 4))
Prelude Data.MemoTrie| :}
Prelude Data.MemoTrie> fastest_f' 12308918230918230182031231231293810923
11097012733777002208302545289166620866358

score 20 · Accepted Answer

エドワードの答えは非常に素晴らしい宝石であるため、私はそれを複製し、関数をオープン再帰形式でメモ化するコンビネータの実装memoListを提供しました。memoTree

{-# LANGUAGE BangPatterns #-}

import Data.Function (fix)

f :: (Integer -> Integer) -> Integer -> Integer
f mf 0 = 0
f mf n = max n $ mf (div n 2) +
                 mf (div n 3) +
                 mf (div n 4)


-- Memoizing using a list

-- The memoizing functionality depends on this being in eta reduced form!
memoList :: ((Integer -> Integer) -> Integer -> Integer) -> Integer -> Integer
memoList f = memoList_f
  where memoList_f = (memo !!) . fromInteger
        memo = map (f memoList_f) [0..]

faster_f :: Integer -> Integer
faster_f = memoList f


-- Memoizing using a tree

data Tree a = Tree (Tree a) a (Tree a)
instance Functor Tree where
    fmap f (Tree l m r) = Tree (fmap f l) (f m) (fmap f r)

index :: Tree a -> Integer -> a
index (Tree _ m _) 0 = m
index (Tree l _ r) n = case (n - 1) `divMod` 2 of
    (q,0) -> index l q
    (q,1) -> index r q

nats :: Tree Integer
nats = go 0 1
    where
        go !n !s = Tree (go l s') n (go r s')
            where
                l = n + s
                r = l + s
                s' = s * 2

toList :: Tree a -> [a]
toList as = map (index as) [0..]

-- The memoizing functionality depends on this being in eta reduced form!
memoTree :: ((Integer -> Integer) -> Integer -> Integer) -> Integer -> Integer
memoTree f = memoTree_f
  where memoTree_f = index memo
        memo = fmap (f memoTree_f) nats

fastest_f :: Integer -> Integer
fastest_f = memoTree f

score 13 · Accepted Answer

最も効率的な方法ではありませんが、メモ化します:

f = 0 : [ g n | n <- [1..] ]
    where g n = max n $ f!!(n `div` 2) + f!!(n `div` 3) + f!!(n `div` 4)

を要求すると、存在するf !! 144ことがチェックされますf !! 143が、その正確な値は計算されません。まだ未知の計算結果として設定されています。計算された正確な値は、必要なものだけです。

そのため、最初は、計算された量に関しては、プログラムは何も知りません。

f = ....

request を作成すると、f !! 12パターンマッチングが開始されます。

f = 0 : g 1 : g 2 : g 3 : g 4 : g 5 : g 6 : g 7 : g 8 : g 9 : g 10 : g 11 : g 12 : ...

これで計算が始まります

f !! 12 = g 12 = max 12 $ f!!6 + f!!4 + f!!3

これは再帰的に f に対して別の要求を行うので、計算します。

f !! 6 = g 6 = max 6 $ f !! 3 + f !! 2 + f !! 1
f !! 3 = g 3 = max 3 $ f !! 1 + f !! 1 + f !! 0
f !! 1 = g 1 = max 1 $ f !! 0 + f !! 0 + f !! 0
f !! 0 = 0

これで、少しずつバックアップできます

f !! 1 = g 1 = max 1 $ 0 + 0 + 0 = 1

つまり、プログラムは次のことを認識しています。

f = 0 : 1 : g 2 : g 3 : g 4 : g 5 : g 6 : g 7 : g 8 : g 9 : g 10 : g 11 : g 12 : ...

トリクルアップし続けます：

f !! 3 = g 3 = max 3 $ 1 + 1 + 0 = 3

つまり、プログラムは次のことを認識しています。

f = 0 : 1 : g 2 : 3 : g 4 : g 5 : g 6 : g 7 : g 8 : g 9 : g 10 : g 11 : g 12 : ...

次に、の計算を続けますf!!6。

f !! 6 = g 6 = max 6 $ 3 + f !! 2 + 1
f !! 2 = g 2 = max 2 $ f !! 1 + f !! 0 + f !! 0 = max 2 $ 1 + 0 + 0 = 2
f !! 6 = g 6 = max 6 $ 3 + 2 + 1 = 6

つまり、プログラムは次のことを認識しています。

f = 0 : 1 : 2 : 3 : g 4 : g 5 : 6 : g 7 : g 8 : g 9 : g 10 : g 11 : g 12 : ...

次に、の計算を続けますf!!12。

f !! 12 = g 12 = max 12 $ 6 + f!!4 + 3
f !! 4 = g 4 = max 4 $ f !! 2 + f !! 1 + f !! 1 = max 4 $ 2 + 1 + 1 = 4
f !! 12 = g 12 = max 12 $ 6 + 4 + 3 = 13

つまり、プログラムは次のことを認識しています。

f = 0 : 1 : 2 : 3 : 4 : g 5 : 6 : g 7 : g 8 : g 9 : g 10 : g 11 : 13 : ...

そのため、計算はかなり遅延して行われます。f !! 8プログラムは、に何らかの値が存在すること、それがに等しいことを認識していますが、何がg 8何であるかはわかりませんg 8。

score 9 · Accepted Answer

これは、Edward Kmett の優れた回答の補遺です。

彼のコードを試してみたところ、との定義がかなり不可解natsにindex思えたので、より理解しやすい代替バージョンを作成しました。

indexとnatsの観点からindex'とを定義しますnats'。

index' t nの範囲で定義されます[1..]。index t(がの範囲で定義されていることを思い出してください。) これは、ビットの文字列として扱い、ビットを逆方向に読み取ること[0..]によってツリーを検索します。nビットがの場合1、右側の分岐を取ります。ビットがの場合、0左側の分岐を取ります。最後のビット ( である必要があります1) に到達すると停止します。

index' (Tree l m r) 1 = m
index' (Tree l m r) n = case n `divMod` 2 of
                          (n', 0) -> index' l n'
                          (n', 1) -> index' r n'

natsが定義されているindexように、それindex nats n == nは常に真であり、nats'はに対して定義されていindex'ます。

nats' = Tree l 1 r
  where
    l = fmap (\n -> n*2)     nats'
    r = fmap (\n -> n*2 + 1) nats'
    nats' = Tree l 1 r

ここで、natsandindexは単にnats'andindex'ですが、値が 1 シフトされています。

index t n = index' t (n+1)
nats = fmap (\n -> n-1) nats'

score 9 · Accepted Answer

Edward Kmett の回答で述べたように、高速化するには、コストのかかる計算をキャッシュし、それらにすばやくアクセスできるようにする必要があります。

関数を非モナドに保つために、(以前の投稿で示されているように) インデックスを付ける適切な方法を使用して、無限の遅延ツリーを構築するソリューションがその目標を満たします。関数の非モナド性を放棄する場合、Haskell で利用可能な標準の連想コンテナを「状態のような」モナド (State や ST など) と組み合わせて使用できます。

主な欠点は、非モナディック関数を取得することですが、構造体を自分でインデックス化する必要がなくなり、連想コンテナーの標準実装を使用するだけで済みます。

そのためには、まず、あらゆる種類のモナドを受け入れるように関数を書き直す必要があります。

fm :: (Integral a, Monad m) => (a -> m a) -> a -> m a
fm _    0 = return 0
fm recf n = do
   recs <- mapM recf $ div n <$> [2, 3, 4]
   return $ max n (sum recs)

テスト用に、Data.Function.fix を使用してメモ化を行わない関数を定義することもできますが、これはもう少し冗長です。

noMemoF :: (Integral n) => n -> n
noMemoF = runIdentity . fix fm

State モナドを Data.Map と組み合わせて使用すると、高速化できます。

import qualified Data.Map.Strict as MS

withMemoStMap :: (Integral n) => n -> n
withMemoStMap n = evalState (fm recF n) MS.empty
   where
      recF i = do
         v <- MS.lookup i <$> get
         case v of
            Just v' -> return v' 
            Nothing -> do
               v' <- fm recF i
               modify $ MS.insert i v'
               return v'

マイナーな変更により、代わりに Data.HashMap で動作するようにコードを適応させることができます。

import qualified Data.HashMap.Strict as HMS

withMemoStHMap :: (Integral n, Hashable n) => n -> n
withMemoStHMap n = evalState (fm recF n) HMS.empty
   where
      recF i = do
         v <- HMS.lookup i <$> get
         case v of
            Just v' -> return v' 
            Nothing -> do
               v' <- fm recF i
               modify $ HMS.insert i v'
               return v'

永続的なデータ構造の代わりに、可変データ構造 (Data.HashTable など) を ST モナドと組み合わせて試すこともできます。

import qualified Data.HashTable.ST.Linear as MHM

withMemoMutMap :: (Integral n, Hashable n) => n -> n
withMemoMutMap n = runST $
   do ht <- MHM.new
      recF ht n
   where
      recF ht i = do
         k <- MHM.lookup ht i
         case k of
            Just k' -> return k'
            Nothing -> do 
               k' <- fm (recF ht) i
               MHM.insert ht i k'
               return k'

メモ化を一切行わない実装と比較して、これらの実装のいずれかを使用すると、巨大な入力に対して、数秒待たずにマイクロ秒で結果を取得できます。

Criterion をベンチマークとして使用すると、Data.HashMap を使用した実装は、タイミングが非常に似ていた Data.Map と Data.HashTable よりもわずかに優れている (約 20%) ことがわかりました。

ベンチマークの結果は少し驚くべきものでした。私が最初に感じたのは、HashTable は変更可能であるため、HashMap の実装よりも優れているということでした。この最後の実装には、パフォーマンス上の欠陥が隠されている可能性があります。

score 4 · Accepted Answer

zipWith数年後、私はこれを見て、ヘルパー関数を使用して線形時間でこれをメモする簡単な方法があることに気付きました。

dilate :: Int -> [x] -> [x]
dilate n xs = replicate n =<< xs

dilateという便利なプロパティがありdilate n xs !! i == xs !! div i nます。

したがって、f(0) が与えられたと仮定すると、これにより計算が単純化されます。

fs = f0 : zipWith max [1..] (tail $ fs#/2 .+. fs#/3 .+. fs#/4)
  where (.+.) = zipWith (+)
        infixl 6 .+.
        (#/) = flip dilate
        infixl 7 #/

元の問題の説明によく似ていて、線形解 ( sum $ take n fsO(n) かかります) を与えます。

score 2 · Accepted Answer

Edward KMETT に基づいていない、インデックスを作成しないソリューション。

共通のサブツリーを共通の親に分解します (f(n/4)はとの間で共有されf(n/2)、f(n/4)との間でf(n/6)共有されます)。それらを単一の変数として親に保存することにより、サブツリーの計算は 1 回行われます。f(2)f(3)

data Tree a =
  Node {datum :: a, child2 :: Tree a, child3 :: Tree a}

f :: Int -> Int
f n = datum root
  where root = f' n Nothing Nothing


-- Pass in the arg
  -- and this node's lifted children (if any).
f' :: Integral a => a -> Maybe (Tree a) -> Maybe (Tree a)-> a
f' 0 _ _ = leaf
    where leaf = Node 0 leaf leaf
f' n m2 m3 = Node d c2 c3
  where
    d = if n < 12 then n
            else max n (d2 + d3 + d4)
    [n2,n3,n4,n6] = map (n `div`) [2,3,4,6]
    [d2,d3,d4,d6] = map datum [c2,c3,c4,c6]
    c2 = case m2 of    -- Check for a passed-in subtree before recursing.
      Just c2' -> c2'
      Nothing -> f' n2 Nothing (Just c6)
    c3 = case m3 of
      Just c3' -> c3'
      Nothing -> f' n3 (Just c6) Nothing
    c4 = child2 c2
    c6 = f' n6 Nothing Nothing

    main =
      print (f 123801)
      -- Should print 248604.

コードは一般的なメモ化関数に簡単に拡張できません (少なくとも、その方法はわかりません)。部分問題がどのように重複するかを実際に考える必要がありますが、この戦略は、一般的な複数の非整数パラメーターに対して機能するはずです。 . (2 つの文字列パラメーターについて考えました。)

メモは計算ごとに破棄されます。(繰り返しますが、2 つの文字列パラメーターについて考えていました。)

これが他の回答よりも効率的かどうかはわかりません。各ルックアップは、技術的には 1 つまたは 2 つのステップ (「あなたの子供またはあなたの子供の子供を見てください」) だけですが、大量の余分なメモリが使用される可能性があります。

~~編集: この解決策はまだ正しくありません。共有は不完全です。~~

n/2/2/2編集：サブチャイルドを適切に共有する必要がありますが、この問題には多くの重要な共有があることに気付きましたn/3/3。同じかもしれません。この問題は、私の戦略には適していません。

haskell - Haskellでメモ化？

8 に答える 8

Related

Reference