algorithm - Apriori の最小信頼度と最小サポート

Question

Apriori アルゴリズムの最小信頼値と最小サポート値の適切な値は? どうすればそれらを微調整できますか? それらは固定値ですか、それともアルゴリズムの実行中に変化しますか? 以前にこのアルゴリズムを使用したことがある場合、どの値を使用しましたか?

score 9 · Accepted Answer

I would suggest to start with values 0.05 for support and 0.80 for confidence. But I agree that you should understand what exactly they represent in order to be able to define them appropriately. For a rule A => B (where A, B non empty sets)

Support (A ⇒ B): s = P(A, B)
Confidence (A ⇒ B): c = P(B | A)
Lift (A ⇒ B): L = c/P(B)

Lift is important to assess the interestingness of a rule (because you usually come up with hundreds of them). More than twenty measures of interestingness have been proposed. These include the Ф-coefficient, kappa, mutual information, the J-measure and the Gini index. I personaly order my rules according to the J-measure.

J.measure (A ⇒B): J = s/c * (c*log(L) + (1-c)*log((L-c)/L))

score 2 · Accepted Answer

アルゴリズムを実行する前に minsup と minconf の値を設定する必要があり、マイニングプロセス中にこれらの値は変更されません。

minsup パラメーターの選択は、データによって異なります。

一部のデータでは、80% を使用します。他のデータについては、 0.05 % を使用します。それはすべてデータセットに依存します。通常、高い値から始めて、十分なパターンを生成する値が見つかるまで値を減らします。

信頼性については、ルールに必要な信頼性を表すため、少し簡単です。通常は 60% 程度を使用します。しかし、それはデータにも依存します。

さらに、minsup パラメータを使用したくない場合は、top-k マイニングアルゴリズムを使用できます。この場合、たとえば k=1000 を指定すると、アルゴリズムは minsup を使用する代わりに、たとえば 1000 個のルールを検出します。アソシエーションルールマイニング用に、そのようなアルゴリズムの 1 つを設計しました。それは TopKRules と呼ばれ、ソースコードをダウンロードします。それを説明する論文が間もなく出版される予定です。k と minconf の 2 つのパラメーターのみを使用します。

algorithm - Apriori の最小信頼度と最小サポート

2 に答える 2

Related

Reference