haskell - より均等に分散されたランダムシーケンスを生成する

Question

この質問を読んで、このアルゴリズムは最適ではないと考えました。たとえば、「f 20 100」は [85,14,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0]; その結果、ゼロテールが長くなることがよくあります。

これは興味深いタスクだと思い、独自の実装を作成することにしました :)

数値をランダムな比率で割ることにしました。

g 1 sum = return [sum]
g n sum = do
    prop <- randomRIO(0.0, 1.0)
    k1 <- g (round prop * n) (round( prop * sum))
    k2 <- g (n - (round prop * n)) (sum - (round prop * sum))
    return k1 ++ k2

しかし、私のコードは機能しません:

   Couldn't match expected type `IO [a0]' with actual type `[a1]'
    In the expression: return k1 ++ k2
    In the expression:
      do { prop <- randomRIO (0.0, 1.0);
           k1 <- g (round prop * n) (round (prop * sum));
           k2 <- g (n - (round prop * n)) (sum - (round prop * sum));
             return k1 ++ k2 }
    In an equation for `g':
        g n sum
          = do { prop <- randomRIO (0.0, 1.0);
                 k1 <- g (round prop * n) (round (prop * sum));

                 k2 <- g (n - (round prop * n)) (sum - (round prop * sum));
                 .... }

ご覧のとおり、IO リストを連結することはできません。どうすれば修正できますか？

score 6 · Accepted Answer

あなたが尋ねる型エラーは、あなたが書くべきであるという事実によって引き起こされます

return (k1 ++ k2)

それよりも

return k1 ++ k2

returnは Haskell の単なる関数であり、関数の適用は他のどの中置演算子よりも強力にバインドされることに注意してください。

(return k1) ++ k2

ただし、コードにはさらに問題があることに注意してください。

score 2 · Accepted Answer

最初に、後で使用するためにいくつかのインポートを行いましょう。

import Control.Applicative
import Control.Monad
import System.Random
import Data.List hiding (partition)

コード修正

関数適用は中置演算子よりも優先順位が高いことを常に覚えておいてください: return k1 ++ k2means(return k1) ++ k2とround prop * nmeans (round prop) * n。を使用$して、関数を適用する式から関数を分離できます。ただし、優先順位が非常に低いためf $ x = f xです$。return $ k1 ++ k2たとえば、を使用できます。

乗算する前に、Ints と Doubles を少し混ぜ合わせ(round prop * n)て比率を丸めていましたが、最初に乗算したいので、に適用fromIntegralする必要がありますn。私はこれのために別の関数を作りました

(.*) :: Double -> Int -> Int
d .* i = floor $ d * fromIntegral i

したがって、代わりに(round prop * n)を使用できます(prop .* n)。これにより、コードが少しクリーンアップされます。つまり、間違っている場合は、あちこちではなく 1 つの関数で修正できます。

エラーメッセージをより有益にするための型シグネチャと、2 番目の基本ケースを提供しました。丸めによって長さ 0 のリストが要求されることがあるため、終了しませんでした。

partition1 :: Int -> Int -> IO [Int]
partition1 0 total = return []
partition1 1 total = return [total]
partition1 n total = do
    prop <- randomRIO(0.0, 1.0)
    k1 <- partition1 (prop .* n) (prop .* total)
    k2 <- partition1 (n - (prop .* n)) (total - (prop .* total))
    return $ k1 ++ k2

また、よりわかりやすい名前を付けることもできました。

適切な合計を取得する

残念ながら、これはコンパイルされますが、Will Ness がコメントで指摘したように、問題があります。通常、合計が合計よりも少ない数値になります。partition 0 nこれは、non-zeroを呼び出してn、長さ 0 のリストを合計するとゼロ以外になるように要求するためです。おっとっと。

アルゴリズムの背後にある考え方は、リストと合計をランダムに分割することですが、両方の比率を同じに保ち、分布が片側にならないようにすることです (元の質問の問題)。

そのアイデアを使用しましょう。ただし、長さゼロを要求しないようにします。prop が 0 でも 1 でもないようにする必要があります。

partition2 :: Int -> Int -> IO [Int]
partition2 0 total = return []
partition2 1 total = return [total]
partition2 n total = do
    new_n <- randomRIO(1,n-1)
    let prop = fromIntegral new_n / fromIntegral n
    k1 <- partition2 new_n (prop .* total)
    k2 <- partition2 (n - new_n) (total - (prop .* total))
    return $ k1 ++ k2

これで、間違った合計が得られることはありません。万歳！

ランダムはフェアと同じではない

しかし、おっと：partition2 18 10000私たちに与えます

[555,555,555,555,556,556,555,556,556,556,555,555,556,556,555,556,556,556]

問題は、フェアがランダムと同じではないということです。このアルゴリズムは非常に公平ですが、あまりランダムではありません。長さとは別にプロポーションを選択させましょう。

partition3 :: Int -> Int -> IO [Int]
partition3 0 total = return []
partition3 1 total = return [total]
partition3 n total = do
    new_n   <- randomRIO(1,n-1)
    new_total <- randomRIO(0,total)  -- it's fine to have zeros.
    k1 <- partition3 new_n new_total
    k2 <- partition3 (n - new_n) (total - new_total)
    return $ k1 ++ k2

それは良く見えます：partition3 15 20000私にくれました

[1134,123,317,725,1031,3897,8089,2111,164,911,25,0,126,938,409]

ランダムは公平ではありませんが、偏りもありません

これは明らかにはるかに優れていますが、基本的に、私たちが行っているバイナリパーティショニングはバイアスを導入しています。

見ることで多くの実行をテストできます

check :: (Int -> Int -> IO [Int]) -> Int -> Int -> Int -> IO ()
check f n total times = mapM_ print =<< map average.transpose.map (righttotal total) <$> replicateM times (f n total)
   where average xs = fromIntegral (sum xs)/fromIntegral total

righttotal tot xs | sum xs == tot = xs
                  | otherwise = error $ "wrong total: " ++ show (sum xs)

の1回の実行で私にcheck partition3 11 10000 1000与えた

n大量のテストデータと分析に立ち入らずに、興味深いことに、がの因数ではない場合に不均衡な量の 0 が存在totalし、分布が均一ではなく、カップの形をしている - アルゴリズムは一方の端にデータを詰め込むことになります.

逃げ道

サブリストにどれだけ入っているかを少しずつ選択する代わりに、小計が一度に終了するすべての場所を生成しましょう。もちろん、そのうちの 1 つは合計である必要があり、生成したら並べ替えたほうがよいでしょう。

stopgaps :: Int -> Int -> IO [Int]
stopgaps parts total = sort.(total:) <$> replicateM (parts-1) (randomRIO (0,total))

ここでは、正しい範囲で乱数replicateM :: Int -> m a -> m [a]を生成するために使用します。parts-1

縁の下の力持ちをプラグインしたい:

mapAccumL :: (acc -> x -> (acc, y)) -> acc -> [x] -> (acc, [y])

リストに沿って蓄積し、新しいリストを生成します。

gapsToLengths :: [Int] -> (Int,[Int])
gapsToLengths = mapAccumL between 0
   where between previous new = (new,new - previous)

partition4 :: Int -> Int -> IO [Int]
partition4 parts total = snd.gapsToLengths <$> stopgaps parts total

それは機能しますか？

partition4 11 10000、かなり印刷されたのいくつかのテスト実行：

[ 786,   20,  607,  677, 1244, 1137,  990,   50, 1716,  813, 1960]
[ 406,  110, 2556,  126, 1289,  567,  348, 1230,  171,  613, 2584]
[ 368, 1794,  136, 1266,  583,   93, 1514,   66, 1594, 1685,  901]
[ 657, 1296, 1754,  411,  691, 1865,  531,  270, 1941,  286,  298]
[2905,  313,  842,  796,  698, 1104,   82, 1475,   22,  619, 1144]
[1411,  966,  530,  129,   81,  561, 1779, 1179,  301,  607, 2456]
[1143,  409,  903,   27,  855,  354,  887, 1898, 1880,  301, 1343]
[ 260,  643,   96,  323,  142,   74,  401,  977, 3685, 2690,  709]
[1350,  979,  377,  765,  137, 1295,  615,  592, 2099, 1088,  703]
[2411,  958,  330, 1433, 1355,  680, 1075,   41,  988,   81,  648]

それはランダムに見えます。偏りがないことを確認しましょう。

check partition4 11 10000 1000
92.6425
93.4513
92.3544
90.8508
88.0297
91.7731
88.7939
86.5268
86.3502
95.2499
93.9774

やっと！

score 0 · Accepted Answer

これは、QuickCheck の使用を容易にするために私が持っているモジュールのセクションです。コードの興味深い部分は比類のないBrent Yorgeyによって書かれており、上記の私のコメントにリンクされているブログ投稿で説明されているように、二項数システムを使用しています。このpickDistribution関数は、特定の重みを持つ負でない数値のリストを生成するためのサンプルグルーコードです (特定の重みを選択するにはresizeを使用できます)。

{-# LANGUAGE MultiParamTypeClasses #-}
module QuickCheckUtils where

import Control.Monad.Reader
import Test.QuickCheck
import Test.QuickCheck.Gen

instance MonadReader Int Gen where
    ask = MkGen (\r n -> n)
    local f (MkGen g) = MkGen (\r -> g r . f)

-- pickDistribution n chooses uniformly at random from all lists of length n of
-- non-negative numbers that sum to the current weight
pickDistribution :: Int -> Gen [Int]
pickDistribution n = do
    m <- ask
    let j = fromIntegral (m+n-1)
        k = fromIntegral (n-1)
    i <- choose (1, binom j k)
    return . map fromIntegral . combToComposition $ toComb j k (i-1)

-- code from Brent {{{
-- Comb n cs represents a choice cs of distinct numbers between 0 and
-- (n-1) inclusive.
data Comb = Comb Integer [Integer] deriving Show
type Comp = [Integer]

-- Convert a choice of (n-1) out of (m+n-1) things into a composition
-- of m, that is, an ordered list of natural numbers with sum m.
combToComposition :: Comb -> Comp
combToComposition (Comb n cs) = map pred $ zipWith (-) cs' (tail cs')
    where cs' = [n] ++ cs ++ [-1]

-- Convert a number into "base binomial", i.e. generate the
-- ith combination in lexicographical order.  See TAOCP 7.2.1.3, Theorem L.
toComb :: Integer -- ^ Total number of things
       -> Integer -- ^ Number to select
       -> Integer -- ^ Index into the lexicographic ordering of combinations
       -> Comb    -- ^ Corresponding combination
toComb n k i = Comb n (toComb' k i (n-1) (binom (n-1) k))

binom _ 0 = 1
binom 0 _ = 0
binom n k = binom (n-1) (k-1) * n `div` k

toComb' 0 _ _ _ = []
toComb' k i j jCk
    | jCk > i   =     toComb' k     i         (j-1) (jCk * (j-k) `div` j)
    | otherwise = j : toComb' (k-1) (i - jCk) (j-1) (jCk *     k `div` j)
-- }}}

haskell - より均等に分散されたランダム シーケンスを生成する

3 に答える 3