mahout - Implementing offline Item based recommendation using Mahout

Question

I am trying to add recommendations to our e-commerce website using Mahout. I have decided to use Item Based recommender, i have around 60K products, 200K users and 4M user-product preferences. I am looking for a way to provide recommendation by calculating the item similarities offline, so that the recommender.recommend() method would provide results in under 100 milli seconds.

DataModel dataModel = new FileDataModel("/FilePath");

_itemSimilarity = new TanimotoCoefficientSimilarity(dataModel);

_recommender = new CachingRecommender(new GenericBooleanPrefItemBasedRecommender(dataModel,_itemSimilarity));

I was hoping if someone could point out to a method or a blog to help me understand the procedure and challenges with an offline computation of the item similarities. Also what is the recommended procedure was storing the pre-computed results from item similarities, should they be stored in a separate db, or a memcache?

PS - I plan to refresh the user-product preference data in 10-12 hours.

score 1 · Accepted Answer

100 ミリ秒以内の応答が必要な場合は、サーバーのバックグラウンドでバッチ処理を実行することをお勧めします。これには、次のジョブが含まれる場合があります。

独自のユーザーデータベース (6 万の製品、20 万のユーザー、および 400 万のユーザー製品設定) からデータを取得します。
データの性質に基づいてデータモデルを準備します (パラメーターの数、データのサイズ、設定値など、他にもたくさんあります)。これは重要なステップになる可能性があります。
データモデルでアルゴリズムを実行します (要件に応じて適切なアルゴリズムを選択する必要があります)。推奨データはこちらから入手できます。
要件に従って、結果のデータを処理する必要がある場合があります。
このデータをデータベースに保存します (すべてのプロジェクトで NoSQL です)

上記の手順は、バッチプロセスとして定期的に実行する必要があります。

ユーザーがレコメンデーションを要求するたびに、サービスは、事前に計算された DB からレコメンデーションデータを読み取ることによって応答を提供します。

この種のタスクについては、Apache Mahout (推奨事項) を参照してください。

これらは簡単な手順です...これが役立つことを願っています!

mahout - Implementing offline Item based recommendation using Mahout

2 に答える 2

Related

Reference