mysql - 大規模なデータセットと ruby の操作

Question

ここで本当に助けを借りることができます。大規模なデータを含むダッシュボードの表示に苦労しています。

@ 2k レコードで作業する場合、平均 @ 2 秒。

MySql コンソールのクエリは、15 万行を返すのに 3.5 秒もかかりません。Ruby で同じクエリを実行すると、クエリが実行されてからすべてのオブジェクトと準備が整うまでに 4 分以上かかります。

目標: キャッシュサーバーを追加する前に、データをさらに最適化します。Ruby 1.9.2、Rails 3.0、Mysql (Mysql2 gem) の操作

質問:

ハッシュを使用するとパフォーマンスが低下しますか?
最初にすべてを 1 つのプライマリハッシュに入れ、後で必要なデータを操作する必要がありますか?
パフォーマンスを向上させるために他にできることはありますか?

DB の行:

GasStations と US Census には @ 150,000 件のレコードがあります
個人は @ 100,000 件のレコードを持っています
車には @ 200,000 件のレコードがあります
FillUps は @ 230 万

ダッシュボードに必要です (過去 24 時間、先週などの期間に基づくクエリ)。JS の JSON 形式で返されるすべてのデータ。

ガソリンスタンド、FillUps および米国国勢調査データ (郵便番号、名前、都市、人口)
補充が最も多い上位 20 都市
フィルアップを搭載した上位 10 台の車
タンクを満タンにした回数でグループ化された車

コード (6 か月のサンプル。100k + レコードを返します):

# for simplicity, removed the select clause I had, but removing data I don't need like updated_at, gas_station.created_at, etc. instead of returning all the columns for each table.
@primary_data = FillUp.includes([:car, :gas_staton, :gas_station => {:uscensus}]).where('fill_ups.created_at >= ?', 6.months.ago) # This would take @ 4 + minutes

# then tried

@primary_data = FillUp.find_by_sql('some long sql query...') # took longer than before.
# Note for others, sql query did some pre processing for me which added attributes to the return.  Query in DB Console took < 4 seconds.  Because of these extra attributes, query took longer as if Ruby was checking each row for mapping attributes

# then tried

MY_MAP = Hash[ActiveRecord::Base.connection.select_all('SELECT thingone, thingtwo from table').map{|one| [one['thingone'], one['thingtwo']]}] as seen http://stackoverflow.com/questions/4456834/ruby-on-rails-storing-and-accessing-large-data-sets
# that took 23 seconds and gained mapping of additional data that was processing later, so much faster

# currently using below which takes @ 10 seconds
# All though this is faster, query still only takes 3.5 seconds, but parsing it to the hashes does add overhead.
cars = {}
gasstations = {}
cities = {}
filled = {}

client = Mysql2::Client.new(:host => "localhost", :username => "root")
client.query("SELECT sum(fill_ups_grouped_by_car_id) as filled, fillups.car_id, cars.make as make, gasstations.name as name,  ....", :stream => true, :as => :json).each do |row|
  # this returns fill ups gouged by car ,fill_ups.car_id, car make, gas station name, gas station zip, gas station city, city population 
  if cities[row['city']]
    cities[row['city']]['fill_ups']  = (cities[row['city']]['fill_ups']  + row['filled'])
  else
    cities[row['city']] = {'fill_ups' => row['filled'], 'population' => row['population']}
  end
  if gasstations[row['name']]
    gasstations[row['name']]['fill_ups'] = (gasstations[row['name']]['fill_ups'] + row['filled'])
  else
    gasstations[row['name']] = {'city' => row['city'],'zip' => row['city'], 'fill_ups' => row['filled']}
  end
  if cars[row['make']]
    cars[row['make']] = (cars[row['make']] + row['filled'])
  else
    cars[row['make']] = row['filled']
  end
  if row['filled']
    filled[row['filled']] = (filled[row['filled']] + 1)
  else
    filled[row['filled']] = 1
  end
end

次のモデルがあります。

def Person
 has_many :cars 
end

def Car
  belongs_to :person
  belongs_to :uscensus, :foreign_key => :zipcode, :primary_key => :zipcode
  has_many :fill_ups
  has_many :gas_stations, :through => :fill_ups
end

def GasStation
  belongs_to :uscensus, :foreign_key => :zipcode, :primary_key => :zipcode
  has_many :fill_ups
  has_many :cars, :through => :fill_ups
end

def FillUp
  # log of every time a person fills up there gas
  belongs_to :car
  belongs_to :gas_station
end

def Uscensus
  # Basic data about area based on Zip code
end

score 2 · Accepted Answer

RoRは使用していませんが、ダッシュボードに10万行を返すのが非常に高速になることはありません。要約テーブルを作成または維持GROUP BYし、データベースでsを実行して、プレゼンテーションの前にデータセットを要約することを強くお勧めします。

mysql - 大規模なデータセットと ruby​​ の操作

1 に答える 1

Related

Reference

mysql - 大規模なデータセットと ruby の操作