linux - pdftotext と pdfimages から PDF 画像をテキストに挿入しますか?

翻译自：https://stackoverflow.com/questions/11321070 2012-07-04T00:42:18.690

1163 次

pdftotextPDFをテキストに変換し、Macで画像を抽出するユーティリティ（おそらくLinuxに付属）をインストールできました。

# install poppler, xpdf, and imagemagick
brew install imagemagick
brew install poppler # not sure if this worked, had to install `xpdf` from online .dmg
pdftotext sample.pdf output.txt
pdfimages sample.pdf pdf-images
# then convert .ppm to .jpg
# one at a time:
# convert pdf-images-001.ppm pdf-images-001.jpg
# batch:
mogrify -format jpg *.ppm

だから今、私はPDFからの（印象的によくフォーマットされた）テキストと、ImageMagickで変換しoutput.txtなければならなかったたくさんの画像を持っています。.ppm.jpg

質問は、これらの画像への参照をoutput.txtドキュメントの適切な場所に挿入する方法はありますか? または、これら 2 つのコマンドを組み合わせて、テキストと画像の両方を抽出し、テキスト内に画像へのリンクを一度に作成する方法はありますか? 自分で画像をテキストに挿入するために解析コードを手動で作成する必要があるかどうか疑問に思っています。

linux - pdftotext と pdfimages から PDF 画像をテキストに挿入しますか?

0 に答える 0

Related

Reference