ruby - タブ区切りファイルから特定の列でアイテムをグループ化する方法

Question

タブ区切りのテキストファイルに次のレコードがあります。

sku title   Product Type                        
19686940    This is test Title1 toys                        
19686941    This is test Title2 toys                        
19686942    This is test Title3 toys                        
20519300    This is test Title1 toys2                       
20519301    This is test Title2 toys2
20580987    This is test Title1 toys3                       
20580988    This is test Title2 toys3                       
20582176    This is test Title1 toys4

で項目をグループ化しProduct Type、すべての一意の単語を検索する方法はtitle?

出力フォーマット：

Product Type   Unique_words 
------------   ------------ 
toys           This is test Title1 Title2 Title3
toys2          This is test Title1 Title2
toys3          This is test Title1 Title2
toys4          This is test Title1

更新
これまで、ファイルを読み取って配列に格納するまでコードを実行しました。

class Product
    attr_reader :sku, :title, :productType
    def initialize(sku,title,productType)
      @sku = sku
      @title = title
      @productType = productType
    end

    def sku
      @sku
    end

    def title
      @title
    end

    def productType
      @productType
    end
end

class FileReader
  def ReadFile(m_FilePath)
    array = Array.new
    lines = IO.readlines(m_FilePath)

    lines.each_with_index do |line, i|
      current_row = line.split("\t")
      product = Product.new(current_row[0],current_row[1],current_row[2])

      array.push product
    end
  end
end

filereader_method = FileReader.new.method("ReadFile")
Reading =  filereader_method.to_proc

puts Reading.call("Input.txt")

score 0 · Accepted Answer

グループ化を取得するには、Enumerable#group_byを使用できます。

Product = Struct.new(:sku, :title, :product_type)

def products_by_type(file_path)
  File.open(file_path)
      .map{ |line| Product.new(*line.chomp.split("\t")) }
      .group_by{ |product| product.product_type }
end

Ruby の美しさは、非常に多くのオプションがあることです。これは単なるデータオブジェクトであるため、 CSVライブラリとOpenStructも確認できます。

require 'csv'
require 'ostruct'

def products_by_type(file_path)
  csv_opts = { col_sep: "\t",
               headers: true,
               header_converters: [:downcase, :symbol] }

  CSV.open(file_path, csv_opts)
     .map{ |row| OpenStruct.new row.to_hash }
     .group_by{ |product| product.product_type }
end

または、ハッシュキーに基づく作成のイディオムを使用して、上記の呼び出しを削除し#to_hashますrow。

class Product
  attr_accessor :sku, :title, :product_type

  def initialize(data)
    data.each{ |key, value| self.key = value }
  end
end

def products_by_type(file_path)
  csv_opts = { #... }

  CSV.open(file_path, csv_opts)
     .map{ |row| Product.new row }
     .group_by{ |product| product.product_type }
end

次に、ハッシュに基づいて、必要に応じて出力をフォーマットします。

def unique_title_words(*products)
  products.flat_map{ |product| product.title.scan(/\w+/) }
          .unique
end

puts "Product Type\tUnique Words"
products_by_type("./file.txt").each do |type, products|
  puts "#{type}\t#{unique_title_words products}"
end

ruby - タブ区切りファイルから特定の列でアイテムをグループ化する方法

1 に答える 1

Related

Reference