python - Python を使用して PDF からテキストを消去する方法

Question

PDFからテキストを編集するためのpythonスクリプトを作成しています。

PDF ファイルの特定の位置にテキストを追加できるこの Python コードがあります。

import PyPDF2
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
import sys

packet = io.BytesIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
# Insert code into specific position
can.drawString(300, 115, "Hello world")
can.save()
#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PyPDF2.PdfFileReader(packet)
# read your existing PDF
existing_pdf = PyPDF2.PdfFileReader(open("original.pdf", "rb"))
num_pages = existing_pdf.numPages 
output = PyPDF2.PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(num_pages-1) # get the last page of the original pdf
page.mergePage(new_pdf.getPage(0)) # merges my created text with my PDF.
x = existing_pdf.getNumPages()
#add all pages from original pdf into output pdf
for n in range(x):
    output.addPage(existing_pdf.getPage(n))
# finally, write "output" to a real file
outputStream = open("output.pdf", "wb")
output.write(outputStream)
outputStream.close()

私の問題:元の PDF の特定の位置にあるテキストをカスタムテキストに置き換えたいです。空白文字を書く方法でうまくいきますが、これを行うものは見つかりませんでした。

PS .:後でこれを.exeファイルとして展開する必要があり、Python コードを使用してそれを行う方法しか知らないため、Python コードである必要があります。

score 0 · Accepted Answer

ReportLab と PyPDF2 で貧弱な編集を行いたい場合は、ReportLab で置換コンテンツを作成します。Canvas、領域を示す四角形、テキスト文字列、およびテキスト文字列が挿入されるポイントを指定すると、次のようになります。

#set a fill color to white:
c.setFillColorRGB(1,1,1)
# draw a rectangle
c.rect([your rectangle], fill=1)
# change color
c.setFillColorRGB(0,0,0)
c.drawString([text insert position], [text string])

作成したこの PDF ドキュメントを一時ファイルに保存します。この PDF ドキュメントと、PyPDF2 の PdfFileReader を使用して変更するドキュメントを開きます。pdfFileWriter オブジェクトを作成し、それを ModifiedDoc と呼びます。一時 PDF のページ 0 を取得し、updatePage と呼びます。他のドキュメントのページ n を取得し、toModifyPage と呼びます。

toModifyPage.mergePage(updatePage)

ページの更新が完了したら:

modifiedDoc.cloneDocumentFromReader(srcDoc)
modifiedDoc.write(outStream)

繰り返しになりますが、この方法を使用すると、ユーザーは新しいコンテンツで覆われる前に元のテキストを表示する可能性があり、テキスト抽出によってその領域の元のテキストと新しいテキストの両方が抽出され、理解できないものに混ざり合う可能性があります。 .

python - Python を使用して PDF からテキストを消去する方法

2 に答える 2

Related

Reference