python - Python スクリプトの実行に関する問題 (pypdf/hex エラー)

Question

PyPDF モジュールを使用して Python スクリプトを作成しようとしています。スクリプトは、「ルート」フォルダーを取得し、その中のすべての PDF を結合し、結合された PDF を「出力」フォルダーに出力し、その名前を「Root.pdf」(分割された PDF を含むフォルダー) に変更します。次に、サブディレクトリについても同じことを行い、最終出力にサブディレクトリと同じ名前を付けます。

サブディレクトリを処理しようとすると行き詰まり、一部の 16 進数値に関連するエラーコードが表示されます。（16進数ではないnull値を取得しているようです）

生成されたエラーコードは次のとおりです。

    Traceback (most recent call last):
  File "C:\Documents and Settings\student3\Desktop\Test\pdfMergerV1.py", line 76, in <module>
    files_recursively(path)
  File "C:\Documents and Settings\student3\Desktop\Test\pdfMergerV1.py", line 74, in files_recursively
    os.path.walk(path, process_file, ())
  File "C:\Python27\lib\ntpath.py", line 263, in walk
    walk(name, func, arg)
  File "C:\Python27\lib\ntpath.py", line 259, in walk
    func(arg, top, names)
  File "C:\Documents and Settings\student3\Desktop\Test\pdfMergerV1.py", line 38, in process_file
    pdf = PdfFileReader(file( filename, "rb"))
  File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 374, in __init__
    self.read(stream)
  File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 775, in read
    newTrailer = readObject(stream, self)
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 67, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 531, in readFromStream
    value = readObject(stream, pdf)
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 58, in readObject
    return ArrayObject.readFromStream(stream, pdf)
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 153, in readFromStream
    arr.append(readObject(stream, pdf))
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 69, in readObject
    return readHexStringFromStream(stream)
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 276, in readHexStringFromStream
    txt += chr(int(x, base=16))
ValueError: invalid literal for int() with base 16: '\x00\x00'

これは、スクリプトのソースコードです。

 #----------------------------------------------------------------------------------------------
# Name:        pdfMerger
# Purpose:     Automatic merging of all PDF files in a directory and its sub-directories and
#              rename them according to the folder itself. Requires the pyPDF Module
#
# Current:     Processes all the PDF files in the current directory
# To-Do:       Process the sub-directories.
#
# Version: 1.0
# Author:      Brian Livori
#
# Created:     03/08/2011
# Copyright:   (c) Brian Livori 2011
# Licence:     Open-Source
#---------------------------------------------------------------------------------------------
#!/usr/bin/env <strong class="highlight">python</strong>

import os
import glob
import sys
import fnmatch

from pyPdf import PdfFileReader, PdfFileWriter

output = PdfFileWriter()

path = str(os.getcwd())

x = 0

def process_file(_, path, filelist):
    for filename in filelist:
        if filename.endswith('.pdf'):

            filename = os.path.join(path, filename)
            print "Merging " + filename

            pdf = PdfFileReader(file( filename, "rb"))

            x = pdf.getNumPages()

            i = 0

            while (i != x):

                output.addPage(pdf.getPage(i))
                print "Merging page: " + str(i+1) + "/" + str(x)

                i += 1

                output_dir = "\Output\\"

                ext = ".pdf"
                dir =  os.path.basename(path)
                outputpath = str(os.getcwd()) + output_dir
                final_output = outputpath

                if os.path.exists(final_output) != True:

                                os.mkdir(final_output)
                                outputStream = file(final_output + dir + ext, "wb")
                                os.path.join(outputStream)
                                output.write(outputStream)
                                outputStream.close()

                else:

                                outputStream = file(final_output + dir + ext, "wb")
                                os.path.join(outputStream)
                                output.write(outputStream)
                                outputStream.close()

def files_recursively(topdir):
        os.path.walk(path, process_file, ())

files_recursively(path)

score 0 · Accepted Answer

あなたが読んでいるPDFファイルは有効なPDFファイルではないか、PyPDFが準備されているよりもエキゾチックなもののようです。読みやすい PDF ファイルがありますか?

また、コードには奇妙な点がいくつかありますが、これは本当に重要な場合があります。

output_dir = "\Output\\"

あなたが\O望むものではないエスケープシーケンスがあります。

python - Python スクリプトの実行に関する問題 (pypdf/hex エラー)

1 に答える 1

Related

Reference