python - Programming Collective Intelligence の Pearson Algorithm がまだ機能しない

Question

コードを実行してピアソン相関係数を計算すると、関数 (下に貼り付けたもの) は頑固に 0 を返します。

SO に関するこの問題に関する以前の提案 (以下の #1、#2 を参照) に沿って、関数が浮動小数点計算を実行できることを確認しましたが、それは役に立ちませんでした。これでいくつかのガイダンスをいただければ幸いです。

    from __future__ import division
    from math import sqrt

    def sim_pearson(prefs,p1,p2):
    # Get the list of mutually rated items
       si={}
       for item in prefs[p1]:
          if item in prefs[p2]: si[item]=1


          # Find the number of elements
          n=float(len(si))


          # if they are no ratings in common, return 0
          if n==0: return 0


          # Add up all the preferences
          sum1=float(sum([prefs[p1][it] for it in si]))
          sum2=float(sum([prefs[p2][it] for it in si]))

          # Sum up the squares
          sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
          sum2Sq=sum([pow(prefs[p2][it],2) for it in si])

          # Sum up the products
          pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])


          # Calculate Pearson score
          num=pSum-(1.0*sum1*sum2/n)
          den=sqrt((sum1Sq-1.0*pow(sum1,2)/n)*(sum2Sq-1.0*pow(sum2,2)/n))
          if den==0: return 0

          r=num/den

          return r

私のデータセット:

 # A dictionary of movie critics and their ratings of a small
 # set of movies

 critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
       'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
       'The Night Listener': 3.0},
     'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
      'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
      'You, Me and Dupree': 3.5},
     'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
      'Superman Returns': 3.5, 'The Night Listener': 4.0},
     'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
      'The Night Listener': 4.5, 'Superman Returns': 4.0,
      'You, Me and Dupree': 2.5},
     'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
      'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
      'You, Me and Dupree': 2.0},
     'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
      'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
     'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}

その他の同様の質問:

score 2 · Accepted Answer

コメントの皆さんの助けのおかげで、私は問題を特定しました。冗談だ。たくさんの問題がありました。最後に、for ループが (6 行目で) 二重化されておらず、二重化する必要があることに気付きました。終盤の一手前、必死に囲んでしまいましたfloat、すいません。とにかく、浮きが欲しい。keys()その前に、彼は必要な批評家のために言及していなかったという事実がありました. また、ピアソン係数の計算が間違っていたため、数学者が修正する必要がありました (私は数学の学士号を持っています)。これで、Gene Seymour と Lisa Rose のテスト済みの例が正しく機能します。とにかく、これをpearson.py、または何かとして保存します。

from __future__ import division
from math import sqrt

def sim_pearson(prefs,p1,p2):
# Get the list of mutually rated items
   si={}
   for item in prefs[p1].keys():
      for item in prefs[p2].keys():
         if item in prefs[p2].keys():
            si[item]=1


      # Find the number of elements
      n=float(len(si))


      # if they are no ratings in common, return 0
      if n==0:
         print 'n=0'
         return 0


      # Add up all the preferences
      sum1=float(sum([prefs[p1][it] for it in si.keys()]))
      sum2=float(sum([prefs[p2][it] for it in si.keys()]))
      print 'sum1=', sum1, 'sum2=', sum2
      # Sum up the squares
      sum1Sq=float(sum([pow(prefs[p1][it],2) for it in si.keys()]))
      sum2Sq=float(sum([pow(prefs[p2][it],2) for it in si.keys()]))
      print 'sum1s=', sum1Sq, 'sum2s=', sum2Sq
      # Sum up the products
      pSum=float(sum([prefs[p1][it]*prefs[p2][it] for it in si.keys()]))


      # Calculate Pearson score
      num=(pSum/n)-(1.0*sum1*sum2/pow(n,2))
      den=sqrt(((sum1Sq/n)-float(pow(sum1,2))/float(pow(n,2)))*((sum2Sq/n)-float(pow(sum2,2))/float(pow(n,2))))
      if den==0:
         print 'den=0'
         return 0

      r=num/den

      return r

critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
   'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
   'The Night Listener': 3.0},
 'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
  'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
  'You, Me and Dupree': 3.5},
 'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
  'Superman Returns': 3.5, 'The Night Listener': 4.0},
 'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
  'The Night Listener': 4.5, 'Superman Returns': 4.0,
                                                                                                                                                             1,1

次に、次のように入力します。

import pearson
pearson.sim_pearson(pearson.critics, pearson.critics.keys()[1], pearson.critics.keys()[2])

または単に：

import pearson
pearson.sim_pearson(pearson.critics, 'Lisa Rose', 'Gene Seymour')

動作に問題がある場合はお知らせください。printどのように解決したかがわかるように、トラブルシューティングに使用したステートメントを残しましたが、明らかに必要ありません。

この本でこれ以上問題に遭遇し、それを修正できない場合は、SO の助けを借りて私に電子メールを送ってください: raphael[at]postacle.com。私も少し前にダウンロードしましたが、少し怠惰です;）

score 0 · Accepted Answer

@roflsは正しかった-

For ループ: for ループが問題でした。Floats: 一部の用語は int 型から float 型に変換する必要がありました。
Keys : これは私が間違いなく最初に見逃したもので、@rofls が見つけました。
ピアソン係数: ピアソン係数の分母コンポーネントを引き締めて、ウィキペディアのページの式と一致させました。これは、数学的プロパティセクションの最後の式です。

コードが動作するようになりました。入力のさまざまな組み合わせで試しました。

    from __future__ import division
    from math import sqrt

    def sim_pearson(prefs,p1,p2):
    # Get the list of mutually rated items
       si={}
       for item in prefs[p1].keys():
           if item in prefs[p2].keys():
               print 'item=', item
               si[item]=1

       # Find the number of elements
       n=float(len(si))
       print 'n=', n

       # if they are no ratings in common, return 0
       if n==0:
           print 'n=0'
           return 0

       # Add up all the preferences
       sum1=float(sum([prefs[p1][it] for it in si.keys()]))
       sum2=float(sum([prefs[p2][it] for it in si.keys()]))
       print 'sum1=', sum1, 'sum2=', sum2

       # Sum up the squares
       sum1Sq=float(sum([pow(prefs[p1][it],2) for it in si.keys()]))
       sum2Sq=float(sum([pow(prefs[p2][it],2) for it in si.keys()]))
       print 'sum1s=', sum1Sq, 'sum2s=', sum2Sq

       # Sum up the products
       pSum=float(sum([prefs[p1][it]*prefs[p2][it] for it in si.keys()]))
       print 'pSum=', pSum

       # Calculate Pearson score
       num=(n*pSum)-(1.0*sum1*sum2)
       print 'num=', num
       den1=sqrt((n*sum1Sq)-float(pow(sum1,2)))
       print 'den1=', den1
       den2=sqrt((n*sum2Sq)-float(pow(sum2,2)))
       print 'den2=', den2
       den=1.0*den1*den2

      if den==0:
           print 'den=0'
           return 0

       r=num/den
       return r

    critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
          'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
          'The Night Listener': 3.0},
         'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
          'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
          'You, Me and Dupree': 3.5},
         'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
          'Superman Returns': 3.5, 'The Night Listener': 4.0},
         'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
          'The Night Listener': 4.5, 'Superman Returns': 4.0,
          'You, Me and Dupree': 2.5},
         'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
          'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
          'You, Me and Dupree': 2.0},
         'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
          'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
         'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}

    # Done

python - Programming Collective Intelligence の Pearson Algorithm がまだ機能しない

2 に答える 2

Related

Reference