ruby-on-rails - マルチスレッドレーキタスク

Question

私は、毎分 (おそらく将来的には 30 秒ごと) に呼び出される rake タスクを作成しています。このタスクは、ポーリング API エンドポイント (データベース内のユーザーごと) に接続します。明らかに、これはシングルスレッドとして効率的に実行することはできませんが、マルチスレッド化は可能ですか? そうでない場合、ジョブを完了できる優れたイベントベースの HTTP ライブラリはありますか?

score 13 · Accepted Answer

私は、毎分 (将来的には 30 秒ごとに) 呼び出される rake タスクを書いています。

Rails の起動時間に注意してください。Resque や Sidekiq などの分岐モデルを使用する方がよいかもしれません。Rescue は、必要なことを実行できるはずのhttps://github.com/bvandenbos/resque-schedulerを提供しますが、できませんSidekiqについて話しますが、同様のものが利用できると確信しています（SidekiqはResqueよりもはるかに新しいです）

明らかに、これはシングルスレッドとして効率的に実行することはできませんが、マルチスレッド化は可能ですか? そうでない場合、ジョブを完了できる優れたイベントベースの HTTP ライブラリはありますか?

ファインダープロセスをより効率的にするためのヒントについては、 ActiveRecordfind_eachを参照することをお勧めします。バッチを取得したら、次のようなスレッドを使用して簡単に何かを実行できます。

#
# Find each returns 50 by default, you can pass options
# to optimize that for larger (or smaller) batch sizes
# depending on your available RAM
#
Users.find_each do |batch_of_users|
  #
  # Find each returns an Enumerable collection of users
  # in that batch, they'll be always smaller than or 
  # equal to the batch size chosen in `find_each`
  #
  #
  # We collect a bunch of new threads, one for each
  # user, eac 
  #
  batch_threads = batch_of_users.collect do |user|
    #
    # We pass the user to the thread, this is good
    # habit for shared variables, in this case
    # it doesn't make much difference
    #
    Thread.new(user) do |u|
      #
      # Do the API call here use `u` (not `user`)
      # to access the user instance
      #
      # We shouldn't need to use an evented HTTP library
      # Ruby threads will pass control when the IO happens
      # control will return to the thread sometime when
      # the scheduler decides, but 99% of the time
      # HTTP and network IO are the best thread optimized
      # thing you can do in Ruby.
      #
    end
  end
  #
  # Joining threads means waiting for them to finish
  # before moving onto the next batch.
  #
  batch_threads.map(&:join)
end

これはbatch_size、スレッドの数だけを開始し、各スレッドが終了するのを待ちbatch_sizeます。

このようなことを行うことは可能ですが、その場合、制御不能な数のスレッドが発生します。ここから利益を得る可能性のある代替手段があります。ThreadPool を含め、はるかに複雑になり、実行する作業の共有リストになります。スタックオーバーフローをスパムしないように、Github に投稿しました: https://gist.github.com/6767fbad1f0a66fa90ac

score 3 · Accepted Answer

マルチスレッドに優れたsidekiqを使用することをお勧めします。その後、API をポーリングするために、ユーザーごとに個別のジョブをキューに入れることができます。時計仕掛けを使用して、エンキューしたジョブを繰り返し実行できます。

ruby-on-rails - マルチスレッドレーキタスク

2 に答える 2

Related

Reference