ruby-on-rails - Rails データベースの効率的な一括更新

Question

データベースを頻繁に更新する rake ユーティリティを構築しようとしています。

これは私がこれまでに持っているコードです:

namespace :utils do

  # utils:update_ip
  # Downloads the file frim <url> to the temp folder then unzips it in <file_path>
  # Then updates the database.

  desc "Update ip-to-country database"
  task :update_ip => :environment do

    require 'open-uri'
    require 'zip/zipfilesystem'
    require 'csv'

    file_name = "ip-to-country.csv"
    file_path = "#{RAILS_ROOT}/db/" + file_name
    url = 'http://ip-to-country.webhosting.info/downloads/ip-to-country.csv.zip'


    #check last time we updated the database.
    mod_time = ''
    mod_time = File.new(file_path).mtime.httpdate    if File.exists? file_path

    begin
      puts 'Downloading update...'
      #send conditional GET to server
      zipped_file = open(url, {'If-Modified-Since' => mod_time})
    rescue OpenURI::HTTPError => the_error
      if the_error.io.status[0] == '304'
        puts 'Nothing to update.'
      else
        puts 'HTTPError: ' + the_error.message
      end
    else # file was downloaded without error.

      Rails.logger.info 'ip-to-coutry: Remote database was last updated: ' + zipped_file.meta['last-modified']
      delay = Time.now - zipped_file.last_modified
      Rails.logger.info "ip-to-country: Database was outdated for: #{delay} seconds (#{delay / 60 / 60 / 24 } days)"

      puts 'Unzipping...'
      File.delete(file_path) if File.exists? file_path
      Zip::ZipFile.open(zipped_file.path) do |zipfile|
        zipfile.extract(file_name, file_path)
      end

      Iptocs.delete_all

      puts "Importing new database..."


      # TODO: way, way too heavy find a better solution.


      CSV.open(file_path, 'r') do |row|
        ip = Iptocs.new(  :ip_from        => row.shift,
                        :ip_to          => row.shift,
                        :country_code2  => row.shift,
                        :country_code3  => row.shift,
                        :country_name   => row.shift)
        ip.save
      end #CSV
      puts "Complete."

    end #begin-resuce
  end #task
end #namespace

私が抱えている問題は、10 万以上のエントリを入力するのに数分かかることです。データベースを更新するより効率的な方法を見つけたいと思います。理想的には、これはデータベースの種類とは無関係のままですが、そうでない場合、本番サーバーは MySQL で実行されます。

洞察をありがとう。

score 9 · Accepted Answer

一括インポートにAR 拡張機能を使用しようとしましたか? 数千行を DB に挿入すると、パフォーマンスが大幅に向上します。詳細については、彼らのウェブサイトをご覧ください。

詳細については、これらの例を参照してください

使用例1

使用例2

使用例3

score 3 · Accepted Answer

高速 Luke にはデータベースレベルのユーティリティを使用してください。

残念ながら、それらはデータベース固有です。しかし、それらは高速です mysql については、http://dev.mysql.com/doc/refman/5.1/en/load-data.html を参照してください

score 1 · Accepted Answer

私は現在activerecord-importを試していますが、これは非常に有望に聞こえます。

https://github.com/zdennis/activerecord-import

score 1 · Accepted Answer

必要なすべての INSERT を含むテキストファイルを生成してから、次のコマンドを実行できます。

mysql -u user -p db_name < mytextfile.txt

これがより速くなるかどうかはわかりませんが、試してみる価値があります...

score 0 · Accepted Answer

Larryが言うように、ファイルが希望の形式である場合は、DB固有のインポートユーティリティを使用してください。ただし、挿入する前にデータを操作する必要がある場合は、多くの行のデータを使用して単一のINSERTクエリを生成できます。これは、行ごとに個別のクエリを使用するよりも高速です（ActiveRecordが行うように）。例えば：

INSERT INTO iptocs (ip_from, ip_to, country_code) VALUES
  ('xxx', 'xxx', 'xxx'),
  ('yyy', 'yyy', 'yyy'),
  ...;

ruby-on-rails - Rails データベースの効率的な一括更新

5 に答える 5

Related

Reference