c# - 「ハッシュ関数の配布」とはどういう意味ですか?

Question

Object.GetHashCode メソッドの MSDN のドキュメントを読んでいるときに、ハッシュ関数がハッシュテーブルでランダムまたは有用な分散を提供する必要があるなどのフレーズに出くわしました。この分散は、ハッシュ関数またはハッシュテーブルに関して何を意味しますか?

score 13 · Accepted Answer

ハッシュ関数は、ハッシュテーブルの「バランスをとる」目的で 32 ビット整数を生成します。テーブルに 100 個の「バケット」があり、ハッシュ関数の 10 進数の下 2 桁に基づいて、テーブル内の項目をバケットに入れるとします。

ここで、ハッシュ関数が常に 100 の倍数の数値を生成するとします。すべてのアイテムが同じバケットに移動し、ハッシュテーブルのバランスが崩れます。それは悪いハッシュ関数です。

優れたハッシュアルゴリズムは、バケットの数に関係なく、ハッシュからバケット番号を抽出する方法に関係なく、ほぼ均等な分布を生成します。

score 2 · Accepted Answer

For hash tables to function with maximum efficacy, hash values should be as unique as possible to prevent collisions. For example, let's consider an extremely naïve hash function: let's say your objects are first and last names, and for your hash value, you choose the initials. So Ginger Rodgers' hash value is GR and Fred Astaire's hash value is FA. So far so good, but what happens when Frank Allen comes along with a hash value of FA? Now you have a collision between Fred Astaire and Frank Allen, and the hash table implementation has to handle this as a special case, which reduces efficiency.

The best hash functions take the input space (Fred Astaire), and produce a random value is (ideally) unique to the input space. As long as the size of your hash is smaller than the size of your data, there's no way to completely avoid collisions, but they should be minimized by carefully choosing the hash algorithm.

As pointed out by Eric below, hash algoirthms to balance hash tables have to be very fast, so you have to strike a balance between speed and collisions. You can study cryptographic hash algorithms like SHA-1 (http://en.wikipedia.org/wiki/SHA-1) to understand the complexities in generating unique hashes, but hash algorithms for balancing hash tables need to be as quick as possible.

c# - 「ハッシュ関数の配布」とはどういう意味ですか?

2 に答える 2

Related

Reference