.net - vb.net を使用して tiff ファイルと pdf ファイルを比較する

Question

VB、.Net 環境で、tiff ファイルと pdf ファイルを比較し (視覚的な比較)、true または false を返すサンプルコードまたはサードパーティツールを探しています。

私の要件は、iTextSharpを使用して行ったtiffファイルをpdfに変換することでしたが、変換後にvb.netプログラムを使用して何も変更されなかったことを証明する必要があります(なぜですか?.私にはわかりませんが、そのようなものを提供する必要があります)サービス）。

そのようなツールを知っているかどうか教えてください、私は探していましたが、ある形式を別の形式に変換したり、同じ形式のファイルを比較したりするツールしかありませんでした

score 2 · Accepted Answer

PDF から tiff を再抽出して、この画像の生データを元の tiff ファイルの生データと比較してみてください。

PDF 形式は TIFF ファイルの埋め込みをサポートしているため、顧客はおそらく、画像を他の形式に再圧縮していないこと、およびその過程で品質が低下していないことを確認したいと考えています。それは合理的な懸念です。

画像ファイルから生データを取得する:

iText を使用しているため、1 ページの tiff ファイルの場合、メソッドを使用してこのデータを取得できる場合がありますImage.rawData()。メソッドTiffImage.getTiffImageを使用して、TIFF ファイルからこのImageクラスのインスタンスを作成できます。

PDF ファイルから生データを取得する:

ここで説明されているプロセスに従うことができます。その後、メソッドPdfReader.GetStreamBytesを使用して生データを取得できます。

ストリームをバイト単位で比較したり、PDF の作成中にストリームをファイルに保存して、後でコマンドラインツールを使用して比較したり、MD5 ハッシュを計算して代わりに使用したりできます。

私はこのアプローチをテストしていませんが、TIFF メタデータが含まれていないため、うまくいくと思います。

score 2 · Accepted Answer

ImageMagick's compare command can do that very easily.

 compare file.tif file.pdf -compose src delta.pdf

or, assuming multipage TIFFs and multipage PDF, comparing page by page:

 compare file.tif[0] file.pdf[0] -compose src delta_page1.pdf
 compare file.tif[1] file.pdf[1] -compose src delta_page2.pdf
 compare file.tif[2] file.pdf[2] -compose src delta_page3.pdf
 [....]

(ImageMagick's indexing of pages/images starts with [0], not [1]!).

Understanding the delta.pdf:

The resulting delta.pdf will be completely white if there is no visual difference.
The differing pixels will be red.
The resulting file will use the default 72dpi resulution, which can tend to not discover very small pixel differences.

You can even simplify the command like this:

 compare file.tif file.pdf delta.pdf

The resulting delta.pdf will show (for context) the first file from the commandline as a light gray background image, and overlay the differences as red pixels. Of course, in theory you can also reverse the order for each of the commands:

 compare file.pdf file.tif delta.pdf

However, you should be aware that PDF "white" appearing backgrounds in reality very often are transparent, whereas TIFFs are real white. This will lead to a lot of pixel differences showing up. Better stick with the order I named first :-)

Note 1: All these comparisons assume (of course) the same page image dimensions and aspect ratios. (Otherwise you may need to scale one of the two page images first.)

Note 2: You will almost always discover minor pixel differences, depending on your overall processing chain. It all depends on what kind of errors you want to uncover with this comparison. There are quite a few ways to finetune this....

Note 3: If this approach works in principle for you, you can modify the output format: you do not need to really use the visual difference in a "red pixel image". You could instead count the unique white (equal) and red (differing) pixels each, then based on the percentage of red compared to white make a decision wether this is 'good' or 'bad' and finally return 'true' or 'false' accordingly (example command shown for 2 PDFs instead of 1/1 PDF/TIFF):

Sample command:

compare \
   http://qtrac.eu/boson1.pdf[1] http://qtrac.eu/boson2.pdf[1] -compose src \
  -define histogram:unique-colors=true \
  -format %c \
   histogram:info:-

Sample output:

 56934: (61937,    0, 7710,52428) #F1F100001E1ECCCC srgba(241,0,30,0.8)
444056: (65535,65535,65535,52428) #FFFFFFFFFFFFCCCC srgba(255,255,255,0.8)

This output lends itself well for automatic unit testing. You can evaluate the two numbers, easily compute the "red pixel" vs. "white pixel" ration and then decide to return PASSED or FAILED based on a certain threshold (if you don't strictly need "zero red" pixels).

.net - vb.net を使用して tiff ファイルと pdf ファイルを比較する

2 に答える 2

Related

Reference