algorithm - Haskellコードを最適化/改善して、サブシーケンスの頻度を一覧表示します

Question

リストのリストのサブシーケンス頻度をリストするために、以下のコードを記述しました（結果には、サブシーケンスと、サブシーケンスが発生するリストのインデックスが含まれます）。誰かがそれをより簡潔かつ/または効率的にする方法について何か提案がありますか？

サンプル出力：

*メイン>combFreq[[1,2,3,5,7,8]、[2,3,5,6,7]、[3,5,7,9]、[1,2,3,7、 9]、[3,5,7,10]]
[（[3,5]、[0,1,2,4]）、（[2,3]、[0,1,3]）、（[ 3,5,7]、[0,2,4]）、（[5,7]、[0,2,4]）、（[2,3,5]、[0,1]）、（[ 1,2]、[0,3]）、（[1,2,3]、[0,3]）、（[7,9]、[2,3]）]

import Data.List
import Data.Function (on)

--[[1,2,3,5,7,8],[2,3,5,6,7],[3,5,7,9],[1,2,3,7,9],[3,5,7,10]]

tupleCat x y = (fst x, sort $ nub $ snd x ++ snd y)
isInResult x result = case lookup x result of
                        Just a  -> [a]
                        Nothing -> []

sInt xs = concat $ sInt' (csubs xs) 0 (length xs) where
    csubs = map (filter (not . null) . concatMap inits . tails)
    sInt' []     _     _       = []
    sInt' (x:xs) count origLen = 
        let result = (zip (zip (replicate (length xs) count) [count+1..origLen]) 
                 $ map (\y -> intersect x y) xs)
        in concatMap (\x -> let a = fst x in map (\y -> (y,a)) (snd x))
                 result : sInt' xs (count + 1) origLen

concatResults [] result     = result 
concatResults (x:xs) result = 
    let match = isInResult (fst x) result 
        newX  = (fst x, [fst $ snd x, snd $ snd x])
    in  if not (null match)
        then let match'    = (fst x, head match)
                 newResult = deleteBy (\x -> (==match')) match' result
             in concatResults xs (tupleCat match' newX : newResult)
        else concatResults xs (newX : result)

combFreq xs =
  filter (\x -> length (fst x) > 1)
  $ reverse $ sortBy (compare `on` (length . snd)) $ concatResults (sInt xs) []

score 2 · Accepted Answer

これが私がそれを行う方法です。パフォーマンスを比較したことはありませんが、確かに単純です。各リストのすべての連続するサブシーケンスを列挙し、Map. ただし、より簡潔にするという要件を満たす必要があります。

import Data.List as L
import Data.Map (Map)
import qualified Data.Map as M

nonEmptySubs :: [a] -> [[a]]
nonEmptySubs = filter (not . null)
             . concatMap tails
             . inits

makePairs :: (a -> [a]) -> [a] -> [(a, Int)]
makePairs f xs = concat $ zipWith app xs [0 .. ]
    where app y i = zip (f y) (repeat i)

results :: (Ord a) => [[a]] -> Map [a] [Int]
results =
    let ins acc (seq, ind) = M.insertWith (++) seq [ind] acc
        -- Insert the index at the given sequence as a singleton list
    in foldl' ins M.empty . makePairs nonEmptySubs

combFreq :: (Ord a) => [[a]] -> [([a], [Int])]
combFreq = filter (not . null . drop 1 . snd) -- Keep subseqs with more than 1 match
         . filter (not . null . drop 1 . fst) -- keep subseqs longer than 1
         . M.toList
         . results

このバージョンでは定性的結果は同じですが、順序が同じではないことに注意してください。

私の最大の推奨事項は、物事をさらに分解し、いくつかの標準ライブラリから利用できるものを活用して、面倒な作業を行うことです。多くの作業を別々の段階に分割し、それらの段階を構成して最終的な機能を得ることができることに注意してください。

score 0 · Accepted Answer

すべてのリストが増加している場合 (あなたの例のように)、次のように動作するはずです (私は Haskell の初心者なので、美しさではありません。改善方法についてのコメントは大歓迎です)。

import Control.Arrow (first, second)

compFreq ls = cF [] [] ls
  where cF rs cs ls | all null ls = rs
                    | otherwise   = cF (rs++rs') (cs'' ++ c ++ cs') ls'
          where m = minimum $ map head $ filter (not . null) ls
                ls' = map (\l -> if null l || m < head l then l
                                                         else tail l) ls
                is = map snd $ filter ((==m) . head . fst) $ filter (not . null . fst) $ zip ls [0,1..]
                c = if atLeastTwo is then [([m], is)] else []
                fs = filter (\(vs, is') -> atLeastTwo $ combine is is') cs
                cs' = map (\(vs, is') -> (vs++[m], combine is is')) fs
                cs'' = map (second (filter (not . (`elem` is)))) cs
                rs' = filter ok cs'
                combine _ [] = []
                combine [] _ = []
                combine (i:is) (i':is') | i<i' = combine is (i':is')
                                        | i>i' = combine (i:is) is'
                                        | i==i' = i:combine is is'
                atLeastTwo = not . null . drop 1
                ok (js, ts) = atLeastTwo js && atLeastTwo ts

アイデアは、ls' を取得するためにすべてのリストから削除される最小値 m を常に見て、リストを処理することです。インデックスのリストは、 m が削除された場所を示しています。内部作業関数 cF には、2 つの追加パラメーターがあります。現在までの結果のリスト rs と、現在のサブシーケンスのリスト cs です。最小値は、少なくとも 2 回発生する場合、新しいサブシーケンス c を開始します。cs' は m で終わるサブシーケンスであり、cs'' は m のないサブシーケンスです。新しい結果の rs' all には、最後の要素として m が含まれています。

あなたの例の出力は

[([1,2],[0,3]),([2,3],[0,1,3]),([1,2,3],[0,3]),([3,5],[0,1,2,4]),([2,3,5],[0,1]),([5,7],[0,2,4]),([3,5,7],[0,2,4]),([7,9],[2,3])]

algorithm - Haskellコードを最適化/改善して、サブシーケンスの頻度を一覧表示します

2 に答える 2

Related

Reference