haskell - パフォーマンスを犠牲にすることなく、折り畳みでスタックオーバーフローを回避するHaskell

Question

次のコードでは、大きな入力に対してスタックオーバーフローが発生します。

{-# LANGUAGE DeriveDataTypeable, OverloadedStrings #-}
import qualified Data.ByteString.Lazy.Char8 as L


genTweets :: L.ByteString -> L.ByteString
genTweets text | L.null text = ""
               | otherwise = L.intercalate "\n\n" $ genTweets' $ L.words text
  where genTweets' txt = foldr p [] txt
          where p word [] = [word]
                p word words@(w:ws) | L.length word + L.length w <= 139 =
                                        (word `L.append` " " `L.append` w):ws
                                    | otherwise = word:words

述語がサンクのリストを作成していると思いますが、その理由や修正方法がわかりません。

を使用した同等のコードfoldl'は正常に実行されますが、常に追加され、大量のメモリを使用するため、永遠に時間がかかります。

import Data.List (foldl')

genTweetsStrict :: L.ByteString -> L.ByteString
genTweetsStrict text | L.null text = "" 
                     | otherwise = L.intercalate "\n\n" $ genTweetsStrict' $ L.words text
  where genTweetsStrict' txt = foldl' p [] txt
          where p [] word = [word]
                p words word | L.length word + L.length (last words) <= 139 =
                                init words ++ [last words `L.append` " " `L.append` word]
                             | otherwise = words ++ [word]

最初のスニペットがサンクを蓄積する原因は何ですか? また、それを回避することはできますか? 依存しないように2番目のスニペットを書くことは可能(++)ですか?

score 4 · Accepted Answer

L.length word + L.length (last words) <= 139

これが問題です。各反復で、アキュムレータリストをトラバースし、次に

init words ++ [last words `L.append` " " `L.append` word]

最後に追記。明らかに、これには長い時間がかかります (アキュムレータリストの長さに比例します)。より良い解決策は、出力リストを遅延して生成し、処理を入力ストリームの読み取りとインターリーブすることです (最初の 140 文字のツイートを出力するために入力全体を読み取る必要はありません)。

次のバージョンのプログラムは、O(1) スペースを使用しながら、比較的大きなファイル ( /usr/share/dict/words) を 1 秒未満で処理します。

{-# LANGUAGE OverloadedStrings, BangPatterns #-}

module Main where

import qualified Data.ByteString.Lazy.Char8 as L
import Data.Int (Int64)

genTweets :: L.ByteString -> L.ByteString
genTweets text | L.null text = ""
               | otherwise   = L.intercalate "\n\n" $ toTweets $ L.words text
  where

    -- Concatenate words into 139-character tweets.
    toTweets :: [L.ByteString] -> [L.ByteString]
    toTweets []     = []
    toTweets [w]    = [w]
    toTweets (w:ws) = go (L.length w, w) ws

    -- Main loop. Notice how the output tweet (cur_str) is generated as soon as
    -- possible, thus enabling L.writeFile to consume it before the whole
    -- input is processed.
    go :: (Int64, L.ByteString) -> [L.ByteString] -> [L.ByteString]
    go (_cur_len, !cur_str) []     = [cur_str]
    go (!cur_len, !cur_str) (w:ws)
      | lw + cur_len <= 139        = go (cur_len + lw + 1,
                                         cur_str `L.append` " " `L.append` w) ws
      | otherwise                  = cur_str : go (lw, w) ws
      where
        lw = L.length w

-- Notice the use of lazy I/O.
main :: IO ()
main = do dict <- L.readFile "/usr/share/dict/words"
          L.writeFile "tweets" (genTweets dict)

haskell - パフォーマンスを犠牲にすることなく、折り畳みでスタックオーバーフローを回避するHaskell

2 に答える 2

Related

Reference