pdf - プログラムで PDF 注釈 (長方形で囲まれた式など) を画像としてエクスポートするにはどうすればよいですか?

Question

一部の注釈を画像としてエクスポートできるかどうか疑問に思っています。強調表示されたテキストをテキストとしてエクスポートする方法は既に知っていますが、これは数式ではうまく機能しません。数式が注釈 (数式を囲むボックスなど) で示されている場合、pdf スナップショットツールを使用してそれらを一度に画像に変換できますか?

PDF スナップショットツールを使用して、それぞれを手動で簡単に行うことができます。PDFライブラリまたはプログラムには、ページ全体ではなく、注釈で何らかの形でマークされた個々の方程式の画像スナップショットをプログラムで作成できるツールがありますか?

質問の目的上、それらは必ずしも無料のプログラムである必要はありません。ありがとう。

score 2 · Accepted Answer

ここでは、ルビージェムの pdf-reader と rmagick を (imagemagick のインストールと共に) 使用して、完全なルビーベースのソリューションを考え出しました。

require 'pdf-reader'
require 'RMagick'

pdf_file_name='statmech' #without extension
doc = PDF::Reader.new(File.expand_path(pdf_file_name+".pdf"))
$objects = doc.objects

def convertpagetojpgandcrop(filename,pagenum,croprect,imgname)
   pagename = filename+".pdf[#{pagenum-1}]"
   #higher density used for quality purposes (otherwise fuzzy)
   pageim = Magick::Image.read(pagename){ |opts| opts.density = 216}.first
   #factors of 3 needed because higher density TODO: generalize to pdf density!=72
   #SouthWestGravity puts coordinate origin in bottom left to match pdf coords
   eqim =pageim.crop(Magick::SouthWestGravity,...    
   3*croprect[0],3*croprect[1],3*croprect[2]-3*croprect[0],3*croprect[3]-3*croprect[1])
   eqim.write(imgname)
end

def is_square?(object)
   object[:Type] == :Annot && object[:Subtype] == :Square
end
def is_highlight?(object)
   object[:Type] == :Annot && object[:Subtype] == :Highlight
end

def annots_on_page(page)
   references = (page.attributes[:Annots] || [])
   lookup_all(references).flatten
end

def lookup_all(refs)
   refs = *refs
   refs.map { |ref| lookup(ref) }
end

def lookup(ref)
   object = $objects[ref]
   return object unless object.is_a?(Array)
   lookup_all(object)
end

def highlights_on_page(page)
   all_annots = annots_on_page(page)
   all_annots.select { |a| is_highlight?(a) }
end

def squares_on_page(page)
   all_annots = annots_on_page(page)
   all_annots.select { |a| is_square?(a) }
end
def restricted_annots_on_page(page)
   all_annots = annots_on_page(page)
   all_annots.select { |a| is_square?(a)||is_highlight?(a) }
 end
#This block exports a jpg for each 'square' annotation in pdf
doc.pages.each do |page|
   eqnum=0
   all_squares = squares_on_page(page)
   all_squares.each do |annot|
  eqnum = eqnum+1
  puts "#{annot[:Rect]}"
  convertpagetojpgandcrop(pdf_file_name,page.number,annot[:Rect],...
      pdf_file_name+"page#{page.number}eq#{eqnum}.jpg")
   end
 end    

 #This block gives the text of the highlights and wikilinks to the images 
 #TODO:(needs to go in text file)
doc.pages.each do |page|
  eqnum = 0
  annots = restricted_annots_on_page(page)
  if annots.length>0
   puts "# Page #{page.number}"
  end
  annots.each do |annot|
if is_square?(annot)
   eqnum = eqnum+1
   puts "{{wiki:#{pdf_file_name}page#{page.number}eq#{eqnum}.jpg}}"
else
       puts "#{annot[:Contents]}"
end
  end
end

このコードは、オンラインで見つかった pdf-reader および rmagick gem のサンプルコードを拡張したものです。いくつかの行はオリジナルです。

score 1 · Accepted Answer

このコードサンプルはAmyuni PDF Creator .Netを使用しており、一度に 1 つの注釈のみを表示してページをエクスポートします。

using System.IO;
using Amyuni.PDFCreator;
using System.Collections;
//open a pdf document
FileStream testfile = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read);
IacDocument document = new IacDocument(null);
document.SetLicenseKey("your license", "your code");
document.Open(testfile, "");

document.CurrentPageNumber = 1;
IacAttribute attribute = document.CurrentPage.AttributeByName("Objects");

// listobj is an array list of objects
ArrayList listobj = (System.Collections.ArrayList)attribute.Value;
ArrayList annotations = new ArrayList();
foreach (Amyuni.PDFCreator.IacObject iacObj in listobj)
{
    if ((bool)iacObj.AttributeByName("Annotation").Value)
    {
        annotations.Add(iacObj);
        // Put the annotation out of sight
        iacObj.Coordinates = Rectangle.FromLTRB(
                            -iacObj.Coordinates.Left,
                            -iacObj.Coordinates.Top,
                            -iacObj.Coordinates.Right,
                            -iacObj.Coordinates.Bottom);
    }
    else
        iacObj.Delete(false);
}

ArrayList images = new ArrayList();
int i = 0;
foreach (Amyuni.PDFCreator.IacObject iacObj in annotations)
{
    // Back on sight
    iacObj.Coordinates = Rectangle.FromLTRB(
                        -iacObj.Coordinates.Left,
                        -iacObj.Coordinates.Top,
                        -iacObj.Coordinates.Right,
                        -iacObj.Coordinates.Bottom);
    //Draw the page
    Bitmap bmp = new Bitmap(1000, 1000);
    Graphics gr = Graphics.FromImage(bmp);
    IntPtr hdc = gr.GetHdc();
    document.DrawCurrentPage(hdc.ToInt32(), true);
    gr.ReleaseHdc();
    images.Add(bmp);
    bmp.Save("c:\\temp\\image" + i + ".pdf");

    iacObj.Delete(false); // object not needed anymore
    i++;
}

必要に応じて、注釈オブジェクトの Coordinates プロパティを使用して、注釈に対応する結果のイメージの部分を抽出できます。

長方形の領域 (注釈など) からすべてのオブジェクトを抽出する場合は、注釈を収集するループをメソッドIacDocument.GetObjectsInRectangleの呼び出しに置き換えることができます。

通常の免責事項が適用されます

pdf - プログラムで PDF 注釈 (長方形で囲まれた式など) を画像としてエクスポートするにはどうすればよいですか?

2 に答える 2

Related

Reference