私は多かれ少なかれPDFソフトプルーフを説明するものを書き込もうとしています。
抽出したい情報がいくつかありますが、その方法がわかりません。
抽出する必要があるもの:
Bleed: I got this somewhat working with pyPdf, given
that the document uses 72 dpi, which sadly isn't
always the case. I need to be able to calculate
the bleed in millimeters.
Print resolution (dpi): If I read the PDF spec[1] correctly this ought to
always be 72 dpi, unless a page has UserUnit set,
which was only introduced in PDF-1.6, but shouldn't
print documents always be at least 300 dpi? I'm
afraid that I misunderstood something…
I'd also need the print resolution for images, if
they can differ from the default page resolution,
that is.
Text color: I don't have the slightest clue on how to extract
this, the string 'text colour' only shows up once
in the whole spec, without any explanation how it
is set.
Image colormodel: If I understand it correctly I can read this out
in pyPdf with page['/Group']['/CS'] which can be:
- /DeviceRGB
- /DeviceCMY
- /DeviceCMYK
- /DeviceGray
- /DeviceRGBK
- /DeviceN
Font 'embeddedness': I read in another post on stackoverflow that I
can just iterate over the font resources and if a
resource has a '/FontFile'-key that means that
the font is embedded. Is this correct?
pyPdf以外のライブラリがこの情報(またはそれらの組み合わせ)をより適切に抽出できる場合、それらは大歓迎です。これまでのところ、pyPdf、pdfrw、pdfminerをいじくり回しました。これらすべてに、最も広範なドキュメントが正確に含まれているわけではありません。
[1] http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf