I'm trying to optimize a program which needs to compute a hash for a constant size window in a data stream at every position (byte) of stream. It is needed for a lookup of repetitions in disk files much larger than available RAM. Currently I compute separate md5 hash for every window, but it costs a lot of time (window size is a few kilobytes, so each byte of data is processed few thousand times). I wonder if there exist a way to compute every subsequent hash in constant (window-size-independent) time (like addition and subtraction of 1 element in moving average)? The hash function may be anything as long as it gives not to long hashes (50-100 bits is ok) and its computation is reasonably fast. It also must give virtually no colisions on up to trillions of not-so-random windows (TB of data) - every collision means a disk access in my case (crc32 is much to weak, md5 is OK in this aspect).
Linux で利用できる既存のライブラリ関数があれば教えていただければ幸いです。
これは私の最初の質問なので、何か間違ったことをした場合はご容赦ください。
よろしく、バルトス