ruby - 二分探索を使用して配列内の挿入点を見つける方法は？

Question

配列でのバイナリ検索の基本的な考え方は単純ですが、検索で正確なアイテムが見つからない場合は、「近似」インデックスが返される可能性があります。（値が検索された値よりも大きいまたは小さいインデックスが返される場合があります）。

正確な挿入ポイントを探すために、おおよその位置を取得した後、正確な挿入位置を左または右に「スキャン」する必要があるようです。たとえば、Rubyでは次のことができます。arr.insert(exact_index, value)

私は次の解決策を持っていますが、パーツの取り扱いがbegin_index >= end_index少し面倒です。よりエレガントなソリューションを使用できるかどうか疑問に思いますか？

（このソリューションは、完全一致が見つかった場合に複数の一致をスキャンする必要がないため、完全一致に対して返されるインデックスは、値に対応する任意のインデックスを指している可能性があります...しかし、それらがすべて整数である場合、a - 1完全に一致するものが見つかった後、左側の境界を検索するか、右側の境界を検索するために、いつでも検索できますa + 1。）

私の解決策：

DEBUGGING = true

def binary_search_helper(arr, a, begin_index, end_index)
  middle_index = (begin_index + end_index) / 2
  puts "a = #{a}, arr[middle_index] = #{arr[middle_index]}, " +
           "begin_index = #{begin_index}, end_index = #{end_index}, " +
           "middle_index = #{middle_index}" if DEBUGGING
  if arr[middle_index] == a
    return middle_index
  elsif begin_index >= end_index
    index = [begin_index, end_index].min
    return index if a < arr[index] && index >= 0  #careful because -1 means end of array
    index = [begin_index, end_index].max
    return index if a < arr[index] && index >= 0
    return index + 1
  elsif a > arr[middle_index]
    return binary_search_helper(arr, a, middle_index + 1, end_index)
  else
    return binary_search_helper(arr, a, begin_index, middle_index - 1)
  end
end

# for [1,3,5,7,9], searching for 6 will return index for 7 for insertion
# if exact match is found, then return that index
def binary_search(arr, a)
  puts "\nSearching for #{a} in #{arr}" if DEBUGGING
  return 0 if arr.empty?
  result = binary_search_helper(arr, a, 0, arr.length - 1)
  puts "the result is #{result}, the index for value #{arr[result].inspect}" if DEBUGGING
  return result
end


arr = [1,3,5,7,9]
b = 6
arr.insert(binary_search(arr, b), b)
p arr

arr = [1,3,5,7,9,11]
b = 6
arr.insert(binary_search(arr, b), b)
p arr

arr = [1,3,5,7,9]
b = 60
arr.insert(binary_search(arr, b), b)
p arr

arr = [1,3,5,7,9,11]
b = 60
arr.insert(binary_search(arr, b), b)
p arr

arr = [1,3,5,7,9]
b = -60
arr.insert(binary_search(arr, b), b)
p arr

arr = [1,3,5,7,9,11]
b = -60
arr.insert(binary_search(arr, b), b)
p arr

arr = [1]
b = -60
arr.insert(binary_search(arr, b), b)
p arr

arr = [1]
b = 60
arr.insert(binary_search(arr, b), b)
p arr

arr = []
b = 60
arr.insert(binary_search(arr, b), b)
p arr

結果：

Searching for 6 in [1, 3, 5, 7, 9]
a = 6, arr[middle_index] = 5, begin_index = 0, end_index = 4, middle_index = 2
a = 6, arr[middle_index] = 7, begin_index = 3, end_index = 4, middle_index = 3
a = 6, arr[middle_index] = 5, begin_index = 3, end_index = 2, middle_index = 2
the result is 3, the index for value 7
[1, 3, 5, 6, 7, 9]

Searching for 6 in [1, 3, 5, 7, 9, 11]
a = 6, arr[middle_index] = 5, begin_index = 0, end_index = 5, middle_index = 2
a = 6, arr[middle_index] = 9, begin_index = 3, end_index = 5, middle_index = 4
a = 6, arr[middle_index] = 7, begin_index = 3, end_index = 3, middle_index = 3
the result is 3, the index for value 7
[1, 3, 5, 6, 7, 9, 11]

Searching for 60 in [1, 3, 5, 7, 9]
a = 60, arr[middle_index] = 5, begin_index = 0, end_index = 4, middle_index = 2
a = 60, arr[middle_index] = 7, begin_index = 3, end_index = 4, middle_index = 3
a = 60, arr[middle_index] = 9, begin_index = 4, end_index = 4, middle_index = 4
the result is 5, the index for value nil
[1, 3, 5, 7, 9, 60]

Searching for 60 in [1, 3, 5, 7, 9, 11]
a = 60, arr[middle_index] = 5, begin_index = 0, end_index = 5, middle_index = 2
a = 60, arr[middle_index] = 9, begin_index = 3, end_index = 5, middle_index = 4
a = 60, arr[middle_index] = 11, begin_index = 5, end_index = 5, middle_index = 5
the result is 6, the index for value nil
[1, 3, 5, 7, 9, 11, 60]

Searching for -60 in [1, 3, 5, 7, 9]
a = -60, arr[middle_index] = 5, begin_index = 0, end_index = 4, middle_index = 2
a = -60, arr[middle_index] = 1, begin_index = 0, end_index = 1, middle_index = 0
a = -60, arr[middle_index] = 9, begin_index = 0, end_index = -1, middle_index = -1
the result is 0, the index for value 1
[-60, 1, 3, 5, 7, 9]

Searching for -60 in [1, 3, 5, 7, 9, 11]
a = -60, arr[middle_index] = 5, begin_index = 0, end_index = 5, middle_index = 2
a = -60, arr[middle_index] = 1, begin_index = 0, end_index = 1, middle_index = 0
a = -60, arr[middle_index] = 11, begin_index = 0, end_index = -1, middle_index = -1
the result is 0, the index for value 1
[-60, 1, 3, 5, 7, 9, 11]

Searching for -60 in [1]
a = -60, arr[middle_index] = 1, begin_index = 0, end_index = 0, middle_index = 0
the result is 0, the index for value 1
[-60, 1]

Searching for 60 in [1]
a = 60, arr[middle_index] = 1, begin_index = 0, end_index = 0, middle_index = 0
the result is 1, the index for value nil
[1, 60]

Searching for 60 in []
[60]

score 17 · Accepted Answer

これは、OraclesJavaに含まれているJavaのjava.util.Arrays.binarySearchのコードです。

    /**
     * Searches the specified array of ints for the specified value using the
     * binary search algorithm.  The array must be sorted (as
     * by the {@link #sort(int[])} method) prior to making this call.  If it
     * is not sorted, the results are undefined.  If the array contains
     * multiple elements with the specified value, there is no guarantee which
     * one will be found.
     *
     * @param a the array to be searched
     * @param key the value to be searched for
     * @return index of the search key, if it is contained in the array;
     *         otherwise, <tt>(-(<i>insertion point</i>) - 1)</tt>.  The
     *         <i>insertion point</i> is defined as the point at which the
     *         key would be inserted into the array: the index of the first
     *         element greater than the key, or <tt>a.length</tt> if all
     *         elements in the array are less than the specified key.  Note
     *         that this guarantees that the return value will be &gt;= 0 if
     *         and only if the key is found.
     */
    public static int binarySearch(int[] a, int key) {
        return binarySearch0(a, 0, a.length, key);
    }

    // Like public version, but without range checks.
    private static int binarySearch0(int[] a, int fromIndex, int toIndex,
                                     int key) {
        int low = fromIndex;
        int high = toIndex - 1;

        while (low <= high) {
            int mid = (low + high) >>> 1;
            int midVal = a[mid];

            if (midVal < key)
                low = mid + 1;
            else if (midVal > key)
                high = mid - 1;
            else
                return mid; // key found
        }
        return -(low + 1);  // key not found.
    }

アルゴリズムが適切であることが証明されており、結果から、それが完全に一致するのか、挿入ポイントのヒントであるのかがすぐにわかるという事実が気に入っています。

これは私がこれをルビーに翻訳する方法です：

# Inserts the specified value into the specified array using the binary
# search algorithm. The array must be sorted prior to making this call.
# If it is not sorted, the results are undefined.  If the array contains
# multiple elements with the specified value, there is no guarantee
# which one will be found.
#
# @param [Array] array the ordered array into which value should be inserted
# @param [Object] value the value to insert
# @param [Fixnum|Bignum] from_index ordered sub-array starts at
# @param [Fixnum|Bignum] to_index ordered sub-array ends the field before
# @return [Array] the resulting array
def self.insert(array, value, from_index=0,  to_index=array.length)
  array.insert insertion_point(array, value, from_index, to_index), value
end

# Searches the specified array for an insertion point ot the specified value
# using the binary search algorithm.  The array must be sorted prior to making
# this call. If it is not sorted, the results are undefined.  If the array
# contains multiple elements with the specified value, there is no guarantee
# which one will be found.
#
# @param [Array] array the ordered array into which value should be inserted
# @param [Object] value the value to insert
# @param [Fixnum|Bignum] from_index ordered sub-array starts at
# @param [Fixnum|Bignum] to_index ordered sub-array ends the field before
# @return [Fixnum|Bignum] the position where value should be inserted
def self.insertion_point(array, value, from_index=0,  to_index=array.length)
  raise(ArgumentError, 'Invalid Range') if from_index < 0 || from_index > array.length || from_index > to_index || to_index > array.length
  binary_search = _binary_search(array, value, from_index, to_index)
  if binary_search < 0
    -(binary_search + 1)
  else
    binary_search
  end
end

# Searches the specified array for the specified value using the binary
# search algorithm.  The array must be sorted prior to making this call.
# If it is not sorted, the results are undefined.  If the array contains
# multiple elements with the specified value, there is no guarantee which
# one will be found.
#
# @param [Array] array the ordered array in which the value should be searched
# @param [Object] value the value to search for
# @param [Fixnum|Bignum] from_index ordered sub-array starts at
# @param [Fixnum|Bignum] to_index ordered sub-array ends the field before
# @return [Fixnum|Bignum] if > 0 position of value, otherwise -(insertion_point + 1)
def self.binary_search(array, value, from_index=0,  to_index=array.length)
  raise(ArgumentError, 'Invalid Range') if from_index < 0 || from_index > array.length || from_index > to_index || to_index > array.length
  _binary_search(array, value, from_index, to_index)
end

private
# Like binary_search, but without range checks.
#
# @param [Array] array the ordered array in which the value should be searched
# @param [Object] value the value to search for
# @param [Fixnum|Bignum] from_index ordered sub-array starts at
# @param [Fixnum|Bignum] to_index ordered sub-array ends the field before
# @return [Fixnum|Bignum] if > 0 position of value, otherwise -(insertion_point + 1)
def self._binary_search(array, value, from_index, to_index)
  low = from_index
  high = to_index - 1

  while low <= high do
    mid = (low + high) / 2
    mid_val = array[mid]

    if mid_val < value
      low = mid + 1
    elsif mid_val > value
      high = mid - 1
    else
      return mid # value found
    end
  end
  -(low + 1) # value not found.
end

コードは、テストデータに提供されたOPと同じ値を返します。

score 1 · Accepted Answer

2020年の更新

実際、二分探索の挿入問題はよく研究されています。左挿入点と右挿入点があります。コードはウィキペディアとロゼッタコードで見つけることができます。たとえば、左側の挿入ポイントを見つけるためのコードは次のとおりです。

  BinarySearch_Left(A[0..N-1], value) {
      low = 0
      high = N - 1
      while (low <= high) {
          // invariants: value > A[i] for all i < low
                         value <= A[i] for all i > high
          mid = (low + high) / 2
          if (A[mid] >= value)
              high = mid - 1
          else
              low = mid + 1
      }
      return low
  }

1つの注意点はオーバーフローのバグに関するものなので、mid実際にはとして見つける必要がありますlow + floor((high - low) / 2)。

以前の回答：

実際には、をチェックする代わりに、begin_index >= end_indexを使用してより適切に処理できbegin_index > end_index、ソリューションははるかにクリーンです。

def binary_search_helper(arr, a, begin_index, end_index)    
  if begin_index > end_index
    return begin_index
  else
    middle_index = (begin_index + end_index) / 2
    if arr[middle_index] == a
      return middle_index
    elsif a > arr[middle_index]
      return binary_search_helper(arr, a, middle_index + 1, end_index)
    else
      return binary_search_helper(arr, a, begin_index, middle_index - 1)
    end
  end
end

# for [1,3,5,7,9], searching for 6 will return index for 7 for insertion
# if exact match is found, then return that index
def binary_search(arr, a)
  return binary_search_helper(arr, a, 0, arr.length - 1)
end

また、再帰の代わりに反復を使用すると、より高速になり、スタックオーバーフローの心配が少なくなります。

ruby - 二分探索を使用して配列内の挿入点を見つける方法は？

2 に答える 2

2020年の更新

以前の回答：

Related

Reference