algorithm - データマイニングのためのDBSCANアルゴリズムとクラスタリングアルゴリズム

Question

カテゴリデータ (キノコデータセット) に DBSCAN アルゴリズムをどのように実装しますか?

ワンパスクラスタリングアルゴリズムとは何ですか?

ワンパスクラスタリングアルゴリズムの疑似コードを提供していただけますか?

score 1 · Accepted Answer

You can run DBSCAN with an arbitrary distance function without any changes to it. The indexing part will be more difficult, so you will likely only get O(n^2) complexity.

But if you look closely at DBSCAN, all it does is compute distances, compare them to a threshold, and count objects. This is a key strength of it, it can easily be applied to various kinds of data, all you need is to define a distance function and thresholds.

I doubt there is a one-pass version of DBSCAN, as it relies on pairwise distances. You can prune some of these computations (this is where the index comes into play), but essentially you need to compare every object to every other object, so it is in O(n log n) and not one-pass.

One-pass: I believe the original k-means was a one-pass algorithm. The first k objects are your initial means. For every new object, you choose the closes mean and update it (incrementally) with the new object. As long as you don't do another iteration over your data set, this was "one-pass". (The result will be even worse than lloyd-style k-means though).

score -2 · Accepted Answer

最初の k 個の項目を読み取り、それらを保持します。それらの間の距離を計算します。

残りの各項目について:

k 個のアイテムのうちどれに最も近いか、およびこれら 2 つのアイテム間の距離を調べます。
これが k 個の項目のうちの任意の 2 つの間の最も近い距離よりも長い場合、新しい項目をこれら 2 つのうちの 1 つと交換することができ、少なくとも新しい k 個の項目のうちの任意の 2 つの間の最も近い距離を減少させることはできません。この距離をできるだけ長くするようにしてください。

すべてのアイテムのセットを l <= k 個のクラスターに分割できると仮定して、同じクラスター内の任意の 2 点間の距離が、異なるクラスター内の任意の 2 点間の距離よりも小さくなるようにします。次に、このアルゴリズムを実行した後、各クラスターから少なくとも 1 つのポイントを保持します。

algorithm - データマイニングのためのDBSCANアルゴリズムとクラスタリングアルゴリズム

2 に答える 2

Related

Reference