asynchronous - 300k 以上のクライアントを処理するサーバーを作成するには、どのネットワークライブラリを使用すればよいですか?

Question

私は、Zynga Poker のような iOS/Android ポーカーアプリケーションを作成することを初めて試みています。このアプリケーションでは、一度に最大 30 万のクライアント接続を処理する必要があり、ユーザーベースが成長するにつれてさらに多くの接続を処理する必要があります。クライアントは、5 ～ 9 人のプレイヤーが互いにやり取りしている空き部屋に接続し、互いにメッセージを送ることができます。私はかなりの調査を行いましたが、Twisted Matrix がデファクトスタンダードのようですが、コールバックではなくコルーチンを使用して非同期プログラミングを処理する gevent にも非常に興味があります。しかし、私が心配していることの 1 つは、このサイトのプログラマーです。

「だからこそ、多くの著名なリアルタイム Web スタートアップ (Convore など) が、依然として Python の eventlib や gevent などの劣ったソリューションを使用し、せいぜい数百から数千のクライアントを処理できる理由がわかりません。」

これは本当ですか？そのため、どのフレームワークを使用するかで悩んでいます。それとももっと良いものがありますか？そして、300k接続はPythonの多くに要求していますか? そうであれば、サーバーを Java または C++ で実行したいと考えていますが、むしろ Python を使用したいと考えています。

score 5 · Accepted Answer

300,000 concurrent, active connections is too many to reasonably support in a single process or on a single computer, regardless of what language you're using. If your program needs to do anything with the data at all, you're going to need more hardware.

Let's do some back of the envelope math to back this up.

Let's say that, on average, your users click their mouse or tap a key or otherwise do something once every 5 seconds or so. That's 60,000 read events per second. Now, let's say there are 5 players in each game, which means an additional 300,000 write events per second, assuming you have to update all the players for each of these events. So: 360,000 bits of "things" to compute per second - assuming you just need to compute some bytes for input and output and you don't have any computationally intensive game logic (like A.I. players) to invoke.

Let's say you're using a M3 Double Extra Large instance, the largest that amazon currently offers. That's 8 virtual cores. Assuming for the moment that your program is purely parallelizable, that means that every one of these input or output events in your game (including all database activity, external web service API calls, etc) needs to be dealt with in 0.00002 seconds. Now, that leaves you with absolutely no overhead to deal with load spikes, so you'll really want to cut that in half to have any reasonable hope of holding up in case traffic varies, which means 0.00001 seconds per event. That is a pretty hard limit of ten microseconds for all of your game's code to execute; most of the time this sort of responsiveness is measured in milliseconds instead. On my (reasonably fast) desktop computer, it takes almost two microseconds just to go from one call of time.time() to the next, if there is no other code in between. Even finely tuned C code can't get a lot of useful work done in a microsecond.

Which means that, if you want to approach this level of scale, you really, really need to be able to run your service on more than one server at a time. And, once your service can run on two servers, generally it's not a big deal to put it on three, or five, or a hundred.

While I would very much like to tell you to use Twisted (and there are lots of great reasons that you should), the real conclusion here is that you can use anything you please and it will "scale up" to your hundreds of thousands of connections just fine, provided that you write it in such a way that it doesn't depend on a single server servicing all your requests. At the point where you have a service actually processing 300k live, concurrent connections, the difference in performance between using gevent or Twisted or Tornado or Eventlet or EventMachine – if there is in fact any difference at all – will be the difference between, let's say, leasing 50 and 55 instances from Amazon. (And, it's hard to say which one will be faster, since it sort of depends what you'll do with it.) The difference between profiling your own code and keeping close watch over its performance as you develop it, however, is the difference between leasing 500 machines and leasing 50.

asynchronous - 300k 以上のクライアントを処理するサーバーを作成するには、どのネットワーク ライブラリを使用すればよいですか?

1 に答える 1

Related

Reference

asynchronous - 300k 以上のクライアントを処理するサーバーを作成するには、どのネットワークライブラリを使用すればよいですか?