haskell - 「実世界の Haskell」から splitWith を実装する正しい方法は何ですか?

Question

私は実世界の haskell を使って自分の道を歩み、演習をしようとしています。4.5 章の演習 2 で動作するバージョンの splitWith を実装することができました。アキュムレータを使用して新しい関数を実装する必要があるのは、非常に遠回りのようです。折り畳みなど、これを行うためのより慣用的な方法はありますか? foldl のドキュメントを見ましたが、その方法について頭を悩ませていました。

splitWith :: (a -> Bool) -> [a] -> [[a]]
splitWith _ [] = []
splitWith f a  = splitWithAcc f a []
  where 
    splitWithAcc :: (a -> Bool) -> [a] -> [[a]] -> [[a]]
    splitWithAcc f xs acc
      | null xs     = acc
      | f $ head xs = splitWithAcc f (dropWhile f xs) (acc ++ [takeWhile f xs])
      | otherwise   = splitWithAcc f (tail xs) acc

明確化

演習のテキストは次のとおりです。

単語と同様に機能する関数 splitWith を記述しますが、述語と任意の型のリストを取り、述語が False を返すすべての要素で入力リストを分割します。

score 6 · Accepted Answer

再帰はあなたの友人ですが、私はそれを少し違う方法で行います。まず、分割するときに条件を False にするのではなく True にします。次に、Data.List呼び出された便利な関数を利用しますbreak

> :t break
break :: (a -> Bool) -> [a] -> ([a], [a])
> break (== ' ') "This is a test"
("This", " is a test")

そして、それを次のように使用して関数を定義します

splitWith' :: (a -> Bool) -> [a] -> [[a]]
splitWith' cond [] = []
splitWith' cond xs = first : splitWith' cond (safeTail rest)
    where
        (first, rest) = break cond xs
        -- Need this function to handle an empty list
        safeTail [] = []
        safeTail (_:ys) = ys

または、できるだけ紛らわしく書きたい場合は

splitWith'' :: (a -> Bool) -> [a] -> [[a]]
splitWith'' _ [] = []
splitWith'' cond xs = uncurry (:) $ fmap (splitWith'' cond . safeTail) $ break cond xs
    where
        safeTail [] = []
        safeTail (_:ys) = ys

これfmapが機能するのは、タプルが 2 つを超えると関数がタプルの 2 番目の要素に適用されるためです。次に、アンカレー:して最初と残りに適用します。

アップデート

述語が false のときに分割する場合は、span代わりにを使用するかbreak、単に次のように定義できます。

splitWithWeird cond xs = splitWith' (not . cond) xs

ただし、2 番目のオーバーヘッドは明らかにわずかに小さくなります (コンパイラが最適化して除去できない場合を除きます)。

更新 2

繰り返し文字を処理したい場合は、必要に応じて簡単で迅速な修正があります。

> filter (not . null) $ splitWithWeird (/= ' ') "This  is   a    test"
["This","is","a","test"]

このような簡単な修正により、アルゴリズム自体に組み込みたくなるかもしれません。

splitWithWeird :: (a -> Bool) -> [a] -> [[a]]
splitWithWeird cond [] = []
splitWithWeird cond xs = filter (not . null) $ first : splitWithWeird cond (safeTail rest)
    where
        (first, rest) = span cond xs
        safeTail [] = []
        safeTail (_:ys) = ys

しかし、これは悪い考えです。filter (not . null)これは再帰関数であるため、各レベルでの呼び出しを追加しているため、関数内の各分割位置で。これらはすべて、戻る前にリスト全体をスキャンする必要があるため、実行する必要がある追加のチェックがあります。filter (not . null)一度だけ呼び出されるように、別の関数として定義することをお勧めします。

splitWithWeird' :: (a -> Bool) -> [a] -> [[a]]
splitWithWeird' cond xs = filter (not . null) $ splitWithWeird cond xs

または、アルゴリズムに組み込みたい場合は、次のようにします。

splitWithWeird :: (a -> Bool) -> [a] -> [[a]]
splitWithWeird cond xs = filter (not . null) $ splitWithHelper cond xs
    where
        safeTail [] = []
        safeTail (_:ys) = ys
        splitWithHelper cond [] = []
        splitWithHelper cond xs =
            let (first, rest) = span cond xs
            in first : splitWithHelper cond (safeTail rest)

これは、2 つの関数を定義するのと同じことを内部的に行っているだけです。ここで追加のlet ... in ...ステートメントを使用しなければならなかったことに注意してください (ネストするのは好きではありませ(first, rest) = span cond xsんsplitWithHelper) splitWithWeird。where 句に残すと、アルゴリズムが機能しません。

アップデート 3

理想的でない解決策だけをここに残したくないので、先に進んで、条件または要素だけでなく、サブシーケンスで分割するためのアルゴリズムを作成しました。firstの関数を使用しますControl.Arrowが、コードを大幅にコンパクトにするためだけです。

import Control.Arrow (first)

isPrefixOf :: Eq a => [a] -> [a] -> Bool
isPrefixOf [] _ = True
isPrefixOf _ [] = False
isPrefixOf (x:xs) (y:ys) = x == y && isPrefixOf xs ys

splitSubseq :: Eq a => [a] -> [a] -> [[a]]
splitSubseq sub [] = []
splitSubseq sub xs = initial : splitSubseq sub rest
    where
        lsub = length sub
        splitter [] = ([], [])
        splitter yss@(y:ys)
            | isPrefixOf sub yss = ([], drop lsub yss)
            | otherwise = first (y :) $ splitter ys
        (initial, rest) = splitter xs

これが効率的な解決策であると言っているわけではありませんが、従うのはかなり簡単なはずです。isPrefixOf最初に、2 番目のリストが最初のリストで始まる場合に True を返す関数を定義しました。

再帰 ( ) の同じパターンを維持したかったので、 orの代わりfirst : recursive restに書きました。これがその出番です。サブシーケンスがリストのプレフィックスである場合はを返し、そうでない場合はの最初の文字を格納します次に、次の要素からこの操作を繰り返します。ここでの使用は、この関数を再帰的かつ簡潔に記述できるようにするためです。の戻り値の最初の要素に適用されるだけです。から返されるタプルの 2 番目の要素は、まだ消費されていない残りの入力です。splitterspanbreakisPrefixOf([], restAfterSubsequence)first(y :)splittersplitter

興味のある方のために、このアルゴリズムのパフォーマンス統計を以下に示します ( --make -O2.i5 quad でコンパイル):

main = print $ sum $ take (10 ^ 7) $ map length $ splitSubseq " " $ cycle "Testing "

70000000
   6,840,052,808 bytes allocated in the heap
       2,032,868 bytes copied during GC
          42,900 bytes maximum residency (2 sample(s))
          22,636 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     13114 colls,     0 par    0.06s    0.07s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0002s    0.0004s

  TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    3.68s  (  3.74s elapsed)
  GC      time    0.06s  (  0.07s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    3.74s  (  3.81s elapsed)

次に、合計と長さを埋め込み解除します。

main = print $ sum $ take (10 ^ 7) $ map length $ repeat "Testing"

70000000
     240,052,572 bytes allocated in the heap
          12,812 bytes copied during GC
          42,900 bytes maximum residency (2 sample(s))
          22,636 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       458 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.09s  (  0.09s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.11s  (  0.09s elapsed)

したがって、これには約 0.1 秒の時間しかかからないことがわかります。このアルゴリズムが 1,000 万回繰り返される文字列を分割するのに約 3.64 秒かかりますが"Testing "、すべてわずかなメモリしか使用されません。-threaded唯一の欠点は、より多くのコアでコンパイルして実行すると、このアルゴリズムが実際には大幅に遅くなることです。

score 2 · Accepted Answer

foldr右から結果を構築することを想像してください。

splitWith f xs = case foldr g [[]] xs of {([]:r)-> r; r->r}
  where
    g x r@ ~(s:t) | f x = (x:s):t     -- keep `x` if `f x`
                  | null s = r        -- current word already empty
                  | otherwise = []:r  -- split

遅延パターンでは、入力として無限リストを使用できます。テスト:

Prelude> splitWith (/= ' ') "  This is a    test  "
["This","is","a","test"]
Prelude> splitWith (/= ' ') ""
[]
Prelude> take 8 $ splitWith (/= ' ') (cycle "12   12 ")
["12","12","12","12","12","12","12","12"]

haskell - 「実世界の Haskell」から splitWith を実装する正しい方法は何ですか?

5 に答える 5

Related

Reference