0

I am attempting to make a file comparison program, one of the features I would like to implement is to calculate the Similarity and Difference of the two files chosen. I would like this comparison to be fast (if possible) on large files. I am not sure what method should be used, but in the end I want a percentage.

Refer to this gif to get a visual idea.

4

2 に答える 2

0

linq を使用できる場合、これは問題ありません。

var results = your1stEnumerable.Intersect(your2ndEnumerable);
于 2013-10-05T01:58:32.417 に答える
0

ばかげたバイトごとの比較ではなく、バイナリ diff ユーティリティで見られる類似性のようなものが必要になるでしょう。でもね、ただの楽しみのために...

unsafe static long DumbDifference(string file1Path, string file2Path)
{
    // completely untested! also, add some using()s here.
    // also, map views in chunks if you plan to use it on large files.

    MemoryMappedFile file1 = MemoryMappedFile.CreateFromFile(
             file1Path, System.IO.FileMode.Open,
             null, 0, MemoryMappedFileAccess.Read);
    MemoryMappedFile file2 = MemoryMappedFile.CreateFromFile(
             file2Path, System.IO.FileMode.Open,
             null, 0, MemoryMappedFileAccess.Read);
    MemoryMappedViewAccessor view1 = file1.CreateViewAccessor();
    MemoryMappedViewAccessor view2 = file2.CreateViewAccessor();

    long length1 = checked((long)view1.SafeMemoryMappedViewHandle.ByteLength);
    long length2 = checked((long)view2.SafeMemoryMappedViewHandle.ByteLength);
    long minLength = Math.Min(length1, length2);

    byte* ptr1 = null, ptr2 = null;
    view1.SafeMemoryMappedViewHandle.AcquirePointer(ref ptr1);
    view2.SafeMemoryMappedViewHandle.AcquirePointer(ref ptr2);

    ulong differences = (ulong)Math.Abs(length1 - length2);

    for (long i = 0; i < minLength; ++i)
    {
        // if you expect your files to be pretty similar,
        // you could optimize this by comparing long-sized chunks.
        differences += ptr1[i] != ptr2[i] ? 1u : 0u;
    }

    return checked((long)differences);
}

残念ながら、.NET には SIMD サポートが組み込まれていません。

于 2013-10-05T01:30:01.687 に答える