ruby - Ruby-重複する2つの配列の比較を最適化する

Question

私は次の配列を持っています：

A = "cheddar".split(//)  # ["c", "h", "e", "d", "d", "a", "r"]
B = "cheddaar".split(//) # ["c", "h", "e", "d", "d", "a", "a", "r"]

AアレイはBアレイのサブセットです。A配列に別の「d」要素がある場合、それはサブセットではありません。

重複している場合でも、一方が他方のサブセットであるかどうかを比較して見つけたいと思います。A-BまたはA＆Bは重複をキャプチャせず、それらを比較して一致するものを見つけるだけです。だから私はそれが重複をキャプチャする次のように書いた：

B.each do |letter|
    A.delete_at(A.index(letter)) rescue ""
end

p A.empty?

これが最善の方法ですか、それとも最適化できますか？

score 3 · Accepted Answer

これが実際にあなたのアプローチよりも速いかどうかはわかりませんが、その実行時間は O(N+M) である必要があります。ここで、N、M は a、b のサイズです。(ハッシュルックアップと挿入が O(1) で償却されると仮定すると、ハッシュは通常キーサイズの関数であるため、厳密には当てはまりません。ただし、この例では、すべてのキーは単一の文字です。)余分なデータモーションであり、最悪の場合は O(N^2 * M) になる可能性があります。

def ary_subset?(a,b) # true iff a is subset of b
  a_counts = a.reduce(Hash.new(0)) { |m,v| m[v] += 1; m }
  b_counts = b.reduce(Hash.new(0)) { |m,v| m[v] += 1; m }
  a_counts.all? { |a_key,a_ct| a_ct <= b_counts[a_key] }
end

OPは最速の方法を求めたので、この要点で利用可能な小さなマイクロベンチマークを作成しました。

OPの元のアプローチ（op_del）、reduce counts（ct）を使用する私のバージョン、およびcount配列が再利用されるバリアント（ct_acc）、およびMultiSetアプローチ（mset）、EDITをテストし、カウントの非常に簡潔な検索を追加しました比較(slow_ct) . OPの小さな配列入力の例（s）、カーディナリティ10,000のより大きなセット（b）、および大きなセットに対する小さなセット（sb）に対して各バリアントを実行しました。(_slow_ct_ を適切な時間内に完了するには、大きなセットのケースの反復回数を 1 桁減らす必要がありました。) 結果は次のとおりです。

                     user     system      total        real
s_op_del         1.850000   0.000000   1.850000 (  1.853931)
s_ct             2.260000   0.000000   2.260000 (  2.264028)
s_ct_acc         1.700000   0.000000   1.700000 (  1.706881)
s_mset           5.460000   0.000000   5.460000 (  5.484833)
s_slow_ct        1.720000   0.000000   1.720000 (  1.731367)
b_op_del         0.310000   0.000000   0.310000 (  0.312804)
b_ct             0.120000   0.000000   0.120000 (  0.123329)
b_ct_acc         0.100000   0.000000   0.100000 (  0.101532)
b_mset           0.310000   0.000000   0.310000 (  0.319697)
b_slow_ct       82.910000   0.000000  82.910000 ( 83.013747)
sb_op_del        0.710000   0.020000   0.730000 (  0.734022)
sb_ct            0.050000   0.000000   0.050000 (  0.054416)
sb_ct_acc        0.040000   0.000000   0.040000 (  0.059032)
sb_mset          0.110000   0.000000   0.110000 (  0.117027)
sb_slow_ct       0.010000   0.000000   0.010000 (  0.011287)

カウントアキュムレータを再利用してカウントを減らすことは、明らかな勝者です。マルチセットは残念なほど遅かった。

score 2 · Accepted Answer

私の記憶が正しければ、あなたの解決策は O(n^2) です。これは少し面倒ですが、少なくとも大きな入力 (これは O(n)) の場合はより効率的です。もう少し加工が必要かも…

def is_subset?(a, b)
    letters = Hash.new(0)
    a.each_char{|x| letters[x] += 1}
    b.each_char{|x| letters[x] -= 1}
    letters.values.all?{|v| v >= 0 }
end

編集：もう少し効率的：

def is_subset?(a, b)
    letters = Hash.new(0)
    a.each_char{|x| letters[x] += 1}
    b.each_char.all?{|x| (letters[x] -= 1) > 0}
end

score 2 · Accepted Answer

ここで列挙子を利用したいのは間違いありません。これを行う最善の方法は、おそらく group_by を使用して、各文字の出現回数を比較することです。

def subset?(a, b)
   a = a.each_char.group_by { |char| char }
   b = b.each_char.group_by { |char| char }
   a.each_key.all? do |letter|
     b[letter] && a[letter].size < b[letter].size
   end
end

したがって、ハッシュルックアップを O(1) 操作としてカウントすると、これは O(m + n) アルゴリズムになります。

score 2 · Accepted Answer

要件を正しく理解していれば、multiset gem を使用できます。

require 'multiset'
a = Multiset.new "cheddar".split(//)
b = Multiset.new "cheddaar".split(//)

a.subset? b #=> true

score 1 · Accepted Answer

同様の質問が数週間前に投稿され、次のような回答が受け入れられました。

def is_subset?(a,b)
  !a.find{|x| a.count(x) > b.count(x)}
end

ベンチマークの更新

require 'benchmark'

def random_char
  ('a'..'z').to_a.sample
end

A = 8.times.map{random_char}
B = 8.times.map{random_char}

def ary_subset?(a,b) # true iff a is subset of b
  a_counts = a.reduce(Hash.new(0)) { |m,v| m[v] += 1; m }
  b_counts = b.reduce(Hash.new(0)) { |m,v| m[v] += 1; m }
  a_counts.all? { |a_key,a_ct| a_ct <= b_counts[a_key] }
end

Benchmark.bm do |x|
  x.report('me')     {100000.times{is_subset?(A,B)}}
  x.report('dbenhur'){100000.times{ary_subset?(A,B)}}
end

       user     system      total        real
me  0.375000   0.000000   0.375000 (  0.384022)
dbenhur  2.558000   0.000000   2.558000 (  2.550146)

score 1 · Accepted Answer

これを試して：

class String
  def subset_of?(str)
    e2 = str.each_char
    c2 = c2p = nil
    each_char do |c1|
      c2p, c2 = c2, e2.next
      next if c2 == c1    
      c2p, c2 = c2, e2.next until (c2 != c2p) # move until we exclude duplicates
      return false if c2 != c1
    end
    true
  rescue StopIteration
    false
  end
end

関数をテストします。

>> "chedddar".subset_of?("cheddaaaaaar")
=> false
>> "cheddar".subset_of?("cheddaaaaaar")
=> true
>> "cheddar".subset_of?("cheddaaaaaarkkkk")
=> true
>> "chedddar".subset_of?("cheddar")
=> false
>> "chedddar".subset_of?("chedd")
=> false

編集1

提供された追加情報に基づいてソリューションを更新しました。

class String
  def subset_of?(str)
    h1, h2 = [self, str].map {|s| s.each_char.reduce(Hash.new(0)){|h, c| h[c] += 1; h}}
    h1.all?{|c, k| h2[c] >= k}
  end
end

score -1 · Accepted Answer

-1

最初に重複を削除するのはどうですか？

(A.uniq - B.uniq).empty?

于 2012-07-06T07:12:05.517 に答える

ruby - Ruby-重複する2つの配列の比較を最適化する

7 に答える 7

Related

Reference