python - PDFスクレイピング:Pythonでスクレイピングされた各pdfのtxtファイルの作成を自動化する方法は?

Question

私がやりたいことは次のとおりです。pdfファイルのリストを入力として受け取り、リストの各ファイルに対して1つの.txtファイルを返すプログラム。

たとえば、listA = ["file1.pdf", "file2.pdf", "file3.pdf"] が与えられた場合、Python に 3 つの txt ファイル (pdf ファイルごとに 1 つ) を作成させたいとします。たとえば、"file1.txt" とします。「file2.txt」と「file3.txt」。

こいつのおかげで変換部分がスムーズに動くようになった。私が行った唯一の変更は、最初のページのみを抽出するために 0 ではなく 1 を割り当てた maxpages ステートメントです。私が言ったように、私のコードのこの部分は完全に機能しています。これがコードです。

def convert_pdf_to_txt(path):
rsrcmgr = PDFResourceManager()
retstr = StringIO()
codec = 'utf-8'
laparams = LAParams()
device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
fp = file(path, 'rb')
interpreter = PDFPageInterpreter(rsrcmgr, device)
password = ""
#maxpages = 0
maxpages = 1
caching = True
pagenos=set()
for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password,caching=caching, check_extractable=True):
    interpreter.process_page(page)
fp.close()
device.close()
str = retstr.getvalue()
retstr.close()
return str

問題は、2 番目の段落で述べたように、Python に返してもらうことができないように見えることです。私は次のコードを試しました：

def save(lst):
i = 0

while i < len(lst):
    txtfile = "enegep"+str(i)+".txt" #enegep is like the identifier of the files
    artigo = convert_pdf_to_txt(lst[0])
    with open(txtfile, "w") as textfile:
        textfile.write(artigo)
    i += 1

入力として 2 つの pdf ファイルのリストを使用してその保存機能を実行しましたが、1 つの txt ファイルしか生成されず、2 番目の txt ファイルを生成せずに数分間実行し続けました。目標を達成するためのより良いアプローチは何ですか?

score 1 · Accepted Answer

更新しないiので、コードが無限ループに陥る必要がありますi += 1。

def save(lst):
    i = 0   # set to 0 but never changes
    while i < len(lst):
        txtfile = "enegep"+str(i)+".txt" #enegep is like the identifier of the files
        artigo = convert_pdf_to_txt(lista[0])
        with open(txtfile, "w") as textfile:
            textfile.write(artigo)
     i += 1 # you need to  increment i

より良いオプションは、単に使用することrangeです:

def save(lst):
    for i in range(len(lst)): 
        txtfile = "enegep{}.txt".format(i) #enegep is like the identifier of the files
        artigo = convert_pdf_to_txt(lista[0])
        with open(txtfile, "w") as textfile:
            textfile.write(artigo)

また、使用するだけなlista[0]ので、そのコードを変更して、反復ごとにリストを移動することもできます。

lst が実際に lista の場合は、次を使用できますenumerate。

   def save(lst):
        for i, ele in enumerate(lst): 
            txtfile = "enegep{}.txt".format(i) #enegep is like the identifier of the files
            artigo = convert_pdf_to_txt(ele)
            with open(txtfile, "w") as textfile:
                textfile.write(artigo)

python - PDFスクレイピング:Pythonでスクレイピングされた各pdfのtxtファイルの作成を自動化する方法は?

1 に答える 1

Related

Reference