language-agnostic - 連続した同一の重複ファイルを削除する

Question

Windows Server 2003 R2 Enterprise を実行しているサーバーがあり、ディレクトリごとに 1 KB のテキストファイルが 50,000 ～ 250,000 個あります。ファイル名は連続しており (MLLP000001.rcv、MLLP000002.rcv など)、同一のファイルは連続しています。後続のファイルが異なると、別の同一のファイルを受け取ることはないと予想できます。

次のことを行うスクリプトが必要ですが、どこから始めればよいかわかりません。

for each file in the target directory index 'i'
{
  for each file in the target directory index 'j' = i+1
  {
    compare the hash values of files i and j

    if the hashes are identical
      delete file j
    if the hashes differ
      set i = j // to skip past the files that are now deleted
      break
  }
}

DOS バッチスクリプトを試してみましたが、これは非常に面倒です。内側のループから抜け出すことができません。また、外側のループにはディレクトリ内のファイルのリストがあるため、ループ自体が失敗しますが、そのリストは常に変化しています。私の知る限り、VBScript にはハッシュ関数がありません。

score 1 · Accepted Answer

ファイルのサイズは 1KB しかないので、ビットごとの比較を行ってハッシュを回避してみませんか?

score 0 · Accepted Answer

次のようなことができるように聞こえます。

Set Files to an array of files in a given directory.
Set PreviousHash to hash of the first file in the Files.

For each CurrentFile file after the first in Files,
    Set CurrentHash to hash of the CurrentFile.
    If CurrentHash is equal to PreviousHash, then delete CurrentFile.
    Else, set PreviousHash to CurrentHash.

language-agnostic - 連続した同一の重複ファイルを削除する

2 に答える 2

Related

Reference