dataset - Aprioriアルゴリズムのスーパーマーケットデータセット

Question

「「FutureStores」スーパーマーケットのビジネスアナリスト向けのソフトウェアを開発する必要があります。このソフトウェアは、スーパーマーケットの販売取引の特定の移行データに対してアソシエーションルールマイニングを実行し、コンボを準備して割引ポリシーを作成します。このソフトウェアは、データマイニングアルゴリズム、つまりAprioriアルゴリズムを利用しています。アソシエーションルールは、ポジティブアソシエーションルールに基づいて割引ポリシーを生成するために、ユーザーフレンドリーな方法で表示されます。

コーディングしたAprioriアルゴリズムを確認するために、スーパーマーケットのデータセットをどこから入手できますか？

score 7 · Accepted Answer

市場データセットを取得するには、fimi.ua.ac.be / data /にアクセスして、小売データセットをダウンロードします。

これは、ベルギーの店舗からのトランザクションの匿名化されたデータセットです。

Aprioriまたはその他の頻繁なアイテムセットマイニングおよびアソシエーションルールマイニングアルゴリズムのテストに最適です。

score 0 · Accepted Answer

Instead of looking for a real-world dataset, you should design a small, specific dataset for each unit test. The dataset should provide the minimal necessary precondition to verify a single feature of the system. This will make it easier to detect bugs, maintain tests over time, and demonstrate the capabilities and usage patterns of the system to other developers.

An example from a different domain would be tests for a User Subsystem that creates and validates logins to a website.

addsNewUser - empty dataset
throwsExceptionForDuplicateUsername - single-user dataset
correctPasswordPasses - same dataset
throwsExceptionForIncorrectUsername - same dataset
throwsExceptionForIncorrectPassword - same dataset
throwsExceptionWhenNewUsernameExists - two-user dataset

Update: If you need a very large dataset to perform integration or performance testing, you are probably left with writing a program to generate a random collection of purchases. I doubt any existing supermarkets are willing (or able) to part with their real datasets.

That being said, while working as a contractor for a health insurance provider many years ago (pre-HIPAA) I was given a sample dataset to work with. It contained real patient information including SSNs and confidential medical history. :(

dataset - Aprioriアルゴリズムのスーパーマーケットデータセット

2 に答える 2

Related

Reference