python - ディレクトリ内のすべての PDF ファイルをテキストに変換する

Question

PDFファイルをテキストに変換するためにPDFMinerをダウンロードしました。端末でこのコマンドを実行してファイルを変換します

python pdf2txt.py -o myOutput.txt simple1.pdf

それはうまくいきます。今度は、その関数を単純な Python スクリプトに埋め込みたいと思います。ディレクトリ内のすべての PDF ファイルを変換したい

# Lets say I have an array with filenames on it
files = [
    'file1.pdf', 'file2.pdf', 'file3.pdf'
]

# And convert all PDF files to text
# By repeatedly executing pdf2txt.py
for x in range(0, len(files))
    # And run something like
    python pdf2txt.py -o output.txt files[x]

私も使用してみos.systemましたが、点滅するウィンドウが表示されました（私の端末）。配列上のすべてのファイルをテキストに変換したかっただけです。

score 1 · Accepted Answer

モジュールを使用しsubprocessます。

import subprocess

files = [
    'file1.pdf', 'file2.pdf', 'file3.pdf'
]
for f in files:
    cmd = 'python pdf2txt.py -o %s.txt %s' % (f.split('.')[0], f)
    run = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    out, err = run.communicate()

    # display errors if they occur    
    if err:
        print err

詳細については、サブプロセスのドキュメントを参照してください。

score 0 · Accepted Answer

このようなタスクを実行するのに役立つ API があります。ドキュメントを読んでください。

python - ディレクトリ内のすべての PDF ファイルをテキストに変換する

2 に答える 2

Related

Reference