algorithm - 幾何中央値を見つける方法

Question

質問は：

x 座標と y 座標を持つ (2D の) N 点が与えられた場合、他の (N-1) 点から P までの距離の合計が最小になるような点 P (N 与えられた点) を見つけます。

この点は、一般に幾何中央値として知られています。単純なもの以外に、この問題を解決するための効率的なアルゴリズムはありますO(N^2)か?

score 24 · Accepted Answer

シミュレーテッドアニーリングを使用して、ローカルのオンラインジャッジに似た問題を解決したことがあります。それが公式の解決策でもあり、プログラムは AC を得ました。

唯一の違いは、見つけなければならない点がN与えられた点の一部である必要がないことでした。

これは私の C++ コードでありN、50000. プログラムは0.1s2ghz pentium 4 で実行されます。

// header files for IO functions and math
#include <cstdio>
#include <cmath>

// the maximul value n can take
const int maxn = 50001;

// given a point (x, y) on a grid, we can find its left/right/up/down neighbors
// by using these constants: (x + dx[0], y + dy[0]) = upper neighbor etc.
const int dx[] = {-1, 0, 1, 0};
const int dy[] = {0, 1, 0, -1};

// controls the precision - this should give you an answer accurate to 3 decimals
const double eps = 0.001;

// input and output files
FILE *in = fopen("adapost2.in","r"), *out = fopen("adapost2.out","w");

// stores a point in 2d space
struct punct
{
    double x, y;
};

// how many points are in the input file
int n;

// stores the points in the input file
punct a[maxn];

// stores the answer to the question
double x, y;

// finds the sum of (euclidean) distances from each input point to (x, y)
double dist(double x, double y)
{
    double ret = 0;

    for ( int i = 1; i <= n; ++i )
    {
        double dx = a[i].x - x;
        double dy = a[i].y - y;

        ret += sqrt(dx*dx + dy*dy); // classical distance formula
    }

    return ret;
}

// reads the input
void read()
{
    fscanf(in, "%d", &n); // read n from the first 

    // read n points next, one on each line
    for ( int i = 1; i <= n; ++i )
        fscanf(in, "%lf %lf", &a[i].x, &a[i].y), // reads a point
        x += a[i].x,
        y += a[i].y; // we add the x and y at first, because we will start by approximating the answer as the center of gravity

    // divide by the number of points (n) to get the center of gravity
    x /= n; 
    y /= n;
}

// implements the solving algorithm
void go()
{
    // start by finding the sum of distances to the center of gravity
    double d = dist(x, y);

    // our step value, chosen by experimentation
    double step = 100.0;

    // done is used to keep track of updates: if none of the neighbors of the current
    // point that are *step* steps away improve the solution, then *step* is too big
    // and we need to look closer to the current point, so we must half *step*.
    int done = 0;

    // while we still need a more precise answer
    while ( step > eps )
    {
        done = 0;
        for ( int i = 0; i < 4; ++i )
        {
            // check the neighbors in all 4 directions.
            double nx = (double)x + step*dx[i];
            double ny = (double)y + step*dy[i];

            // find the sum of distances to each neighbor
            double t = dist(nx, ny);

            // if a neighbor offers a better sum of distances
            if ( t < d )
            {
                update the current minimum
                d = t;
                x = nx;
                y = ny;

                // an improvement has been made, so
                // don't half step in the next iteration, because we might need
                // to jump the same amount again
                done = 1;
                break;
            }
        }

        // half the step size, because no update has been made, so we might have
        // jumped too much, and now we need to head back some.
        if ( !done )
            step /= 2;
    }
}

int main()
{
    read();
    go();

    // print the answer with 4 decimal points
    fprintf(out, "%.4lf %.4lf\n", x, y);

    return 0;
}

(x, y)次に、このアルゴリズムによって返されるに最も近いものをリストから選択するのが正しいと思います。

このアルゴリズムは、幾何中央値に関するウィキペディアの次の段落を利用しています。

ただし、各ステップでより正確な近似値が生成される反復手順を使用して幾何中央値の近似値を計算するのは簡単です。このタイプの手順は、サンプル点までの距離の合計が凸関数であるという事実から導き出すことができます。これは、各サンプル点までの距離が凸であり、凸関数の合計が凸のままであるためです。したがって、各ステップで距離の合計を減少させる手順は、局所最適にトラップされることはありません。

Endre Weiszfeld [4] の研究の後に Weiszfeld のアルゴリズムと呼ばれる、このタイプの一般的なアプローチの 1 つは、反復的に再重み付けされた最小二乗法の形式です。このアルゴリズムは、現在の推定値からサンプルまでの距離に反比例する一連の重みを定義し、これらの重みに従ってサンプルの加重平均である新しい推定値を作成します。あれは、

上記の最初の段落は、これが機能する理由を説明しています。最適化しようとしている関数には局所的な最小値がないため、繰り返し改善することで貪欲に最小値を見つけることができます。

これは一種の二分探索と考えてください。まず、結果を概算します。適切な近似値は、入力を読み取るときにコードが計算する重心になります。次に、これに隣接する点がより良い解を与えるかどうかを確認します。stepこの場合、ポイントが現在のポイントから離れている場合、そのポイントは隣接していると見なされます。より良い場合は、現在のポイントを破棄しても問題ありません。これは、前述したように、最小化しようとしている関数の性質により、これにより局所最小値にトラップされないためです。

epsこの後、二分探索の場合と同様に、ステップサイズを半分にし、(定数によって制御された) 十分な近似と見なされる値が得られるまで続行します。

したがって、アルゴリズムの複雑さは、結果をどの程度正確にするかによって異なります。

score 2 · Accepted Answer

Weiszfeld メソッドを実装しました (探しているものではないことはわかっていますが、Point を近似するのに役立つ場合があります)。複雑さは O(N*M/k) で、N はポイントの数、M はポイント（あなたの場合は2）とkは望ましいエラーです：

https://github.com/j05u3/weiszfeld-implementation

score 2 · Accepted Answer

ステップ 1:ポイントコレクションを x 次元 (nlogn) で並べ替えますステップ
2:各ポイントとその左側のすべてのポイントの間の x 距離を計算します。

xLDist[0] := 0
for i := 1 to n - 1
       xLDist[i] := xLDist[i-1] + ( ( p[i].x - p[i-1].x ) * i)

ステップ 3:各点とその右側のすべての点の間の X 距離を計算します。

xRDist[n - 1] := 0
for i := n - 2 to 0
       xRDist[i] := xRDist[i+1] + ( ( p[i+1].x - p[i].x ) * i)

ステップ 4:両方を合計すると、各ポイントから他の N-1 ポイントまでの合計 x 距離が得られます

for i := 0 to n - 1
       p[i].xDist = xLDist[i] + xRDist[i]

取得するy次元でステップ1、2、3、4を繰り返しますp[i].yDist

xDistとの合計が最も小さい点がyDist答えです

総複雑度 O(nlogn)

C++ で答える

詳細な説明:
アイデアは、前のポイントの既に計算された合計距離を再利用することです。
3 点 ABCD がソートされているとしましょう。D から他の点までの距離の合計は次のようになります。

AD + BD + CD = (AC + CD) + (BC + CD) + CD = AC + BC + 3CD

これ(AC + BC)は、C からその前の他のものまでの距離の合計です。これを利用して、計算するだけで済みます。ldist(C) + 3CD

score 0 · Accepted Answer

凸計画法として問題を解くことができます (目的関数は常に凸であるとは限りません)。凸プログラムは、L-BFGS などの反復法を使用して解くことができます。各反復のコストは O(N) であり、通常、必要な反復回数は多くありません。必要な反復回数を減らすための重要なポイントの 1 つは、最適な答えが入力の 1 つのポイントであることがわかっていることです。そのため、最適化の答えが入力ポイントの 1 つに近づくと、最適化を停止できます。

algorithm - 幾何中央値を見つける方法

7 に答える 7

Related

Reference