mysql - SQL - 最も類似した製品の選択

Question

よし、製品 ID と属性 ID の 2 つのキーを格納するリレーションがあります。特定の製品に最も類似している製品を特定したい。(属性は実際には数値ですが、例がわかりにくくなるため、視覚的な表現を単純化するために文字に変更されています。)

Prod_att

Product | Attributes  
   1   |    A     
   1   |    B  
   1   |    C  
   2   |    A  
   2   |    B  
   2   |    D  
   3   |    A  
   3   |    E  
   4   |    A

最初はこれはかなり単純に思えます。製品が持つ属性を選択し、製品ごとに共有されている属性の数を数えるだけです。この結果は、製品が持つ属性の数と比較され、2 つの製品がどれほど似ているかがわかります。これは、比較対象の製品に比べて多数の属性を持つ製品には機能しますが、製品の属性が非常に少ない場合に問題が発生します。たとえば、製品 3 は他のほぼすべての製品と同数になります (A は非常に一般的であるため)。

SELECT Product, count(Attributes)  
FROM Prod_att  
WHERE Attributes IN  
(SELECT Attributes  
FROM prod_att  
WHERE Product = 1)  
GROUP BY Product
;

これを修正する方法や現在のクエリの改善に関する提案はありますか?
ありがとう！

*編集: 製品 4 は、すべての製品に対して count() =1 を返します。製品 3 は異なる属性が少ないため、より類似していることを示したいと思います。

score 2 · Accepted Answer

これを試して

SELECT 
  a_product_id, 
  COALESCE( b_product_id, 'no_matchs_found' ) AS closest_product_match
FROM (
  SELECT 
    *,  
    @row_num := IF(@prev_value=A_product_id,@row_num+1,1) AS row_num,
    @prev_value := a_product_id
  FROM 
    (SELECT @prev_value := 0) r
    JOIN (
        SELECT 
         a.product_id as a_product_id,
         b.product_id as b_product_id,
         count( distinct b.Attributes ),
         count( distinct b2.Attributes ) as total_products
        FROM
          products a
          LEFT JOIN products b ON ( a.Attributes = b.Attributes AND a.product_id <> b.product_id )
          LEFT JOIN products b2 ON ( b2.product_id = b.product_id )
       /*WHERE */
         /*  a.product_id = 3 */
        GROUP BY
         a.product_id,
         b.product_id
        ORDER BY 
          1, 3 desc, 4
  ) t
) t2 
WHERE 
  row_num = 1

上記queryはclosest matchesすべての製品のproduct_idを取得します。特定のの結果を取得するために、最も内側のクエリにを含めることができます。一致するものがない場合でも表示されるようにproduct_id使用しましたLEFT JOINproduct

SQLFIDDLE

お役に立てれば

score 0 · Accepted Answer

2 つの製品間で共有される属性の合計を示す小さなビューを作成できます。

create view vw_shared_attributes as
select a.product, 
      b.product 'product_match', 
      count(*) 'shared_attributes'
from  your_table a
  inner join test b on b.attribute = a.attribute and b.product <> a.product
group by a.product, b.product

次に、そのビューを使用して上位の一致を選択します。

   select product,
      (select top 1 s.product_match from vw_shared_attributes s where t.product = s.product order by s.shared_attributes desc)
    from your_table t
    group by product

例については、http://www.sqlfiddle.com/#!6/53039/1を参照してください。

score 0 · Accepted Answer

「ベルヌーイパラメーターのウィルソンスコア信頼区間の下限」を試してください。これは、n が小さい場合の統計的信頼性の問題を明確に扱います。たくさんの数学のように見えますが、実際には、この種のことを正しく行うために必要な最小限の数学です。そして、ウェブサイトはそれをかなりよく説明しています。

これは、正/負のスコアリングから、一致する/一致しない属性の問題へのステップを踏むことが可能であることを前提としています。

正と負のスコアリングと 95% CL の例を次に示します。

SELECT widget_id, ((positive + 1.9208) / (positive + negative) - 
1.96 * SQRT((positive * negative) / (positive + negative) + 0.9604) / 
(positive + negative)) / (1 + 3.8416 / (positive + negative)) 
AS ci_lower_bound FROM widgets WHERE positive + negative > 0 
ORDER BY ci_lower_bound DESC;

mysql - SQL - 最も類似した製品の選択

3 に答える 3

Related

Reference