0

次のシナリオのクラウドコンピューティングソリューションを探していますが、AmazonAWSなどで問題の説明に一致するサービスが見つかりません。私の問題に対応するクラウドコンピューティングプラットフォームを知っていますか?

一般的な問題: データストリームでデータ分析を実行したい(1秒あたり約1kのみ)。データ分析は、そのデータストリームを操作する一連の独立したスレッドによって実行されます。各スレッドは単にブール値を計算します。スレッドが多ければ多いほど、計算結果は良くなります。

私の現在の解決策: 別の部門のIntel Core i7が入った箱を探しましたが、今では彼らはそれを取り戻したいと思っています:-)。

理想的なソリューション: 多数のスレッドを生成できる抽象マシン(無制限のリソースを持つJVMなど)を提供するサービス。また、入力データをストリーミングして計算結果を取得するには、何らかの接続が必要です(<1k /秒)。物事はリアルタイムで発生する必要があります(「次の数分で」のように実行されるようにスケジュールされているのとは対照的です)。

したがって、ボトルネックはメモリやディスクスペースではなく、計算能力と遅延だけです(そして、私は時々データ分析を必要とするので、クラウドコンピューティングはここでは経済的に合理的であるように思われます。)

4

4 に答える 4

2

Interestingly enough I was just writing a post on Making Hadoop Run Faster in which i pointed to stream base processing as away to speed up the processing time of feeds as the comes in rather than processnig them in batch. The solution uses an opensource project named Cloudify.

Cloudify allows me to spawn this entire environment on Amazon or any other cloud through a single command and also auto-scale the processing as the load grows.

A demo environment with the source code and a step by step guide is available here

It sounds to me that this may address your needs - let me know if this isn't the case and i'll dig-in further to see if i can come-up with other solutions.

于 2012-08-23T23:54:13.987 に答える
1

For your case, I will highly recommend Amazon Elastic MapReduce. You can refer to this document for details :- Amazon EMR

It might be a little struggle initially , if you are new to AWS, but it will be great once you know how it works.

于 2012-08-23T13:25:11.627 に答える
1

I noticed you tagged google-app-engine. Probably not what you're looking for, it's more for web services. Google's relatively new Compute Engine matches your description though.

http://cloud.google.com/products/compute-engine.html

于 2012-08-23T14:36:03.330 に答える
1

For completeness from the major vendors you have a few categories of choices:

  1. Cloud compute which scales, from AWS it's EC2; from Google it's Google Compute Engine (still in private beta); from Microsoft it's Azure Virtual Machines (also still in private beta). There are, of course, many other vendors, such as Rackspace (which uses OpenStack and more). Given your scenario, I believe something in this category would be the best choice for you.

  2. Cloud-based MapReduce (running on Hadoop) - from AWS that's Elastic MapReduce; from Google that's BigQuery; from Microsoft that's Hadoop on Azure (which is still in beta). There are other vendors in this space as well...Cloudera, HortonWorks, etc... here's a list.

  3. Cloud-based Database (either RDBMS or NoSQL) - there are many choices here. Because you describe your scenario as 'compute intensive' I am thinking this may not be needed. However depending on the amount & frequency of up/down traffic, if your scenario allow for batching, then you may elect to upload, process and store in the cloud and then pull down via a schedule. From AWS, there are many ways to host a RDBMS - RDS or EC2 are the usual choices; For Google, you can access MySQL via Google Cloud SQL; For Microsoft, your choice is SQL Azure or SQL Server on an Azure VM (latter still in beta). For cloud-hosted NoSQL, you have AWS DynamoDB; from Google you have Google Cloud Storage or the High Replication store (the latter requires you to use GAE); from Microsoft you have Azure storage (tables, blobs and queues).
于 2012-08-24T00:27:14.273 に答える