python - 各シートに複数の論理ページを含むPDFドキュメントを分割するにはどうすればよいですか？

Question

2x2のPDFドキュメントを元のページに分割したいと思います。各ページは、この例のように配置された4つの論理ページで構成されています。

私は使用しようとしていpythonますpypdf：

import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader

def ifel(condition, trueVal, falseVal):
    if condition:
        return trueVal
    else:
        return falseVal

input  = PdfFileReader(file(sys.argv[1], "rb"))
output = PdfFileWriter()

for p in [input.getPage(i) for i in range(0,input.getNumPages())]:
    (w, h) = p.mediaBox.upperRight

    for j in range(0,4):
        t = copy.copy(p)        
        t.mediaBox.lowerLeft  = (ifel(j%2==1, w/2, 0), ifel(j<2, h/2, 0))
        t.mediaBox.upperRight = (ifel(j%2==0, w/2, w), ifel(j>1, h/2, h))
        output.addPage(t)

output.write(file("out.pdf", "wb"))

残念ながら、このスクリプトは4つおきの論理ページを4回出力するため、意図したとおりに機能しません。私はこれまでPythonで何も書いたことがないので、おそらくコピー操作が原因で、非常に基本的な問題だと思います。助けていただければ幸いです。

編集：まあ、私はいくつかの実験をしました。次のように、ページの幅と高さを手動で挿入しました。

import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader

def ifel(condition, trueVal, falseVal):
    if condition:
        return trueVal
    else:
        return falseVal

input  = PdfFileReader(file(sys.argv[1], "rb"))
output = PdfFileWriter()

for p in [input.getPage(i) for i in range(0,input.getNumPages())]:
    (w, h) = p.mediaBox.upperRight

    for j in range(0,4):
        t = copy.copy(p)        
        t.mediaBox.lowerLeft  = (ifel(j%2==1, 841/2, 0),   ifel(j<2, 595/2, 0))
        t.mediaBox.upperRight = (ifel(j%2==0, 841/2, 841), ifel(j>1, 595/2, 595))
        output.addPage(t)

output.write(file("out.pdf", "wb"))

このコードは私の元のコードと同じ間違った結果につながりますが、今行をコメントアウトすると(w, h) = p.mediaBox.upperRight、すべてが機能します！理由がわかりません。タプル(w, h)はもう使用されていません。その定義を削除すると、どのように何かが変わるのでしょうか。

score 0 · Accepted Answer

問題は、mediaBoxがpとすべてのコピーtで共有される変数の魔法のアクセサーにすぎないことだと思います。したがって、に割り当てるとt.mediaBox、mediaBoxは4つのコピーすべてで同じ座標になります。

mediaBoxフィールドの背後にある変数は、mediaBoxへの最初のアクセス時に遅延して作成されるため、行をコメントアウトする(w, h) = p.mediaBox.upperRightと、mediaBox変数はtごとに個別に作成されます。

ページのサイズを自動的に決定するための2つの可能な解決策：

コピーを作成した後、寸法を取得します。

for p in [input.getPage(i) for i in range(0,input.getNumPages())]:

    for j in range(0,4):
        t = copy.copy(p)       
        (w, h) = t.mediaBox.upperRight
        t.mediaBox.lowerLeft  = (ifel(j%2==1, w/2, 0),   ifel(j<2, h/2, 0))
        t.mediaBox.upperRight = (ifel(j%2==0, w/2, w), ifel(j>1, h/2, h))
        output.addPage(t)

mediaBox変数に使用する新しいRectangleObjectsをインスタンス化します

for p in [input.getPage(i) for i in range(0,input.getNumPages())]:
    (w, h) = p.mediaBox.upperRight

    for j in range(0,4):
        t = copy.copy(p)        
        t.mediaBox.lowerLeft  = pyPdf.generic.RectangleObject(
                                    ifel(j%2==1, w/2, 0),   
                                    ifel(j<2, h/2, 0),
                                    ifel(j%2==0, w/2, w), 
                                    ifel(j>1, h/2, h))
        output.addPage(t)

を使用copy.deepcopy()すると、大きくて複雑なPDFのメモリの問題が発生します。

python - 各シートに複数の論理ページを含むPDFドキュメントを分割するにはどうすればよいですか？

1 に答える 1

Related

Reference