algorithm - 最も近い2点間の距離を見つけるHaskell

Question

2 次元空間内の点のリストが与えられた場合、Haskell で関数を実行して、最も近い 2 つの点の間の距離を見つけたいとします。例: 入力: project [(1,5), (3,4), (2,8), (-1,2), (-8.6), (7.0), (1.5), (5.5), (4.8) ), (7.4)] 出力: 2.0

リスト内の最も遠い 2 点間の距離が最大で 10000 であると仮定します。

これが私のコードです：

import Data.List
import System.Random

sort_ :: Ord a => [a] -> [a]
sort_ []    =  []
sort_ [x]  =  [x]
sort_ xs   =  merge (sort_ left) (sort_ right)
  where
    (left, right) = splitAt (length xs `div` 2) xs
    merge [] xs = xs
    merge xs [] = xs
    merge (x:xs) (y:ys)=
    if x <= y then 
        x : merge xs (y:ys)
    else  y : merge (x:xs) ys     

project :: [(Float,Float)] -> Float
project [] = 0
project (x:xs)=
    if null (xs) then 
        error "The list have only 1 point"
    else head(sort_(dstList(x:xs)))

distance :: (Float,Float)->(Float,Float) -> Float
distance (x1,y1) (x2,y2) = sqrt((x1 - x2)^2 + (y1 - y2)^2)


dstList :: [(Float,Float)] -> [Float]
dstList (x:xs)=
    if length xs == 1 then 
        (dstBetween x xs):[]
    else (dstBetween x xs):(dstList xs)


dstBetween :: (Float,Float) -> [(Float,Float)] -> Float
dstBetween pnt (x:xs)=
    if null (xs) then 
        distance pnt x
    else  minimum ((distance pnt ):((dstBetween pnt xs)):[])

{-
Calling generator to create a file created at random points
-}
generator = do
    putStrLn "Enter File Name"
    file <- getLine
    g <- newStdGen
    let pts = take 1000 . unfoldr (Just . (\([a,b],c)->((a,b),c)) . splitAt 2) 
                $ randomRs(-1,1) g :: [(Float,Float)]
    writeFile file . show $ pts

{-
Call the main to read a file and pass it to the function of project
The function of the project should keep the name 'project' as described 
in the statement
-}
main= do
    putStrLn "Enter filename to read"
    name <- getLine
    file <- readFile name
    putStrLn . show . project $ readA file

readA::String->[(Float,Float)]
readA = read

例のように、または次のようにジェネレーターを使用して、プログラムの実行を実行できます。

Haskell インタプリタでは、「generator」と入力する必要があります。プログラムは、ここに 1000 個のポイントを含むファイル名を尋ねます。Haskell インタープリターでファイルが生成された後、main を書き込んで、"generator" で作成するファイルの名前であるファイル名を要求する必要があります。

問題は、ランダムに生成された 1000 ポイントの場合、プログラムに時間がかかり、デュアルコアプロセッサを搭載したコンピューターで約 3 分かかることです。私は何を間違っていますか？コードを最適化してより高速に動作させるにはどうすればよいですか?

score 13 · Accepted Answer

二次アルゴリズムを使用しています:

project []  = error "Empty list of points"
project [_] = error "Single point is given"
project ps  = go 10000 ps
  where
    go a [_]    = a
    go a (p:ps) = let a2 = min a $ minimum [distance p q | q<-ps]
                  in a2 `seq` go a2 ps

より良いアルゴリズムを使用する必要があります。より良いアルゴリズムについては、SO の計算幾何学タグを検索してください。

http://en.wikipedia.org/wiki/Closest_pair_of_points_problemも参照してください。

@maxtaldykin は、ランダムデータに大きな違いをもたらすアルゴリズムへの素晴らしくシンプルで効果的な変更を提案しています。ポイントを X 座標で事前に並べ替え、X 座標でd現在のポイントから単位以上離れたポイントを試行しないでください。 (dは現在知られている最小距離です):

import Data.Ord (comparing)
import Data.List (sortBy)

project2 ps@(_:_:_) = go 10000 p1 t 
  where
    (p1:t) = sortBy (comparing fst) ps
    go d _         [] = d
    go d p1@(x1,_) t  = g2 d t
      where
        g2 d []          = go d (head t) (tail t)
        g2 d (p2@(x2,_):r)
           | x2-x1 >= d  = go d (head t) (tail t)
           | d2 >= d     = g2 d  r
           | otherwise   = g2 d2 r   -- change it "mid-flight"
               where
                 d2 = distance p1 p2

ランダムなデータでg2は、O(1)時間内に機能するので、それgoがかかりO(n)、全体がソートによって制限されます~ n log n.

~ n^2.1最初のコード (1k/2k の範囲) と2 番目のコード (10k/20k の範囲)の成長の経験的な順序を示し~n^1.1、GHCi にクイックアンドダーティコンパイルロードをテストします (2 番目のコードは最初のコードよりも 50 倍速く実行されます)。 2,000 ポイント、3,000 ポイントで 80 倍高速）。

score 6 · Accepted Answer

ブルートフォース検索を少し変更して、ランダムデータのパフォーマンスを向上させることができます。

主なアイデアは、ポイントを x 座標で並べ替え、ループで距離を比較しながら、現在の最小距離よりも大きくない水平距離を持つポイントのみを考慮することです。

これは一桁速くなる可能性がありますが、最悪の場合でもO(n^2)です。
実際、2000 ポイントでは、私のマシンでは 50 倍高速です。

project points = loop1 10000 byX
  where
    -- sort points by x coordinate
    --  (you need import Data.Ord to use `comparing`)
    byX = sortBy (comparing fst) points

    -- loop through all points from left to right
    -- threading `d` through iterations as a minimum distance so far
    loop1 d = foldl' loop2 d . tails

    -- `tail` drops leftmost points one by one so `x` is moving from left to right
    -- and `xs` contains all points to the right of `x`
    loop2 d [] = d
    loop2 d (x:xs) = let
        -- we take only those points of `xs` whose horizontal distance
        -- is not greater than current minimum distance
        xs' = takeWhile ((<=d) . distanceX x) xs
        distanceX (a,_) (b,_) = b - a

        -- then just get minimum distance from `x` to those `xs'`
      in minimum $ d : map (distance x) xs'

ところで、あまり多くの括弧を使用しないでください。Haskell では、関数の引数を囲む必要はありません。

algorithm - 最も近い2点間の距離を見つけるHaskell

2 に答える 2

Related

Reference