amazon-kinesis - 多くのプロデューサーを持つ Kinesis シャード

Question

多くのデータソース (携帯電話など) からデータを収集する必要があります。たとえば、1,000 台の電話で、それぞれが 20 分ごとに 1 MB のバッチをアップロードします。単一のシャードで Kinesis ストリームを使用してデータを取り込むことを考えています (合計スループットは約 1MB/秒です)。個々の電話が Kinesis API に直接アクセスすることは理にかなっていますか? それとも、独自のフロントエンド (ウェブサーバーなど) を前面に配置する必要がありますか? この決定を行う際に留意すべき主な制限/考慮事項は何ですか?

PS AWS IoT インフラストラクチャを使用する代替手段は、かなり高価になります。

score 3 · Accepted Answer

You should have a web service that receives the data from your clients and will send them to Kinesis. This web server can use the Kinesis Producer Library (KPL) that offer best performance in terms of message rate delivery, timeout, policy retry and scalability. KPL can create many workers and can be tuned to optimize the message rate and not exceed the write limit imposed by Kinesys Shards.

Have every single client that sends data to kinesis could be an overkill in terms of performance, mainenantce costs and delivery. What happen if a client start to sends data at high rate traffic? A shard has a rate limit for write operation (up to 1,000 Record/s, data write rate up to 1 MB/s). An 'aggressive' client could generate eccessive traffic and make a shard not responding for a while, and block all the other clients that send records that should be stored in the same shard.

Moreover, think about the delivery cost over thousands of clients. What happen if you want change the stream name? or change the accessID/ key? Or just switch from kinesis to kafka? You have to manage the update of thousands of clients.

With a web server, you can hide the complexity and make any change transparent to the client. You can think to run the web service directly in EC2. Have the producer directly in AWS should reduce the network latency. Moreover, you can take advantage of all the scalability/resiliency/fault tolerance features offered by AWS.

amazon-kinesis - 多くのプロデューサーを持つ Kinesis シャード

1 に答える 1

Related

Reference