python - How to structure Python function so that it continues after error?

Question

I am new to Python, and with some really great assistance from StackOverflow, I've written a program that:

1) Looks in a given directory, and for each file in that directory:

2) Runs a HTML-cleaning program, which:

Opens each file with BeautifulSoup
Removes blacklisted tags & content
Prettifies the remaining content
Runs Bleach to remove all non-whitelisted tags & attributes
Saves out as a new file

It works very well, except when it hits a certain kind of file content that throws up a bunch of BeautifulSoup errors and aborts the whole thing. I want it to be robust against that, as I won't have control over what sort of content winds up in this directory.

So, my question is: How can I re-structure the program so that when it errors on one file within the directory, it reports that it was unable to process that file, and then continues to run through the remaining files?

Here is my code so far (with extraneous detail removed):

def clean_dir(directory):
    os.chdir(directory)

    for filename in os.listdir(directory):
    clean_file(filename)

def clean_file(filename):

    tag_black_list = ['iframe', 'script']
    tag_white_list = ['p', 'div']
    attr_white_list = {'*': ['title']}

    with open(filename, 'r') as fhandle: 

        text = BeautifulSoup(fhandle)
        text.encode("utf-8")
        print "Opened "+ filename

        # Step one, with BeautifulSoup: Remove tags in tag_black_list, destroy contents.
        [s.decompose() for s in text(tag_black_list)]
        pretty = (text.prettify())
        print "Prettified"

        # Step two, with Bleach: Remove tags and attributes not in whitelists, leave tag contents.
        cleaned = bleach.clean(pretty, strip="TRUE", attributes=attr_white_list, tags=tag_white_list)

        fout = open("../posts-cleaned/"+filename, "w")
        fout.write(cleaned.encode("utf-8"))
        fout.close()

    print "Saved " + filename +" in /posts-cleaned"

print "Done"

clean_dir("../posts/")

I looking for any guidance on how to write this so that it will keep running after hitting a parsing/encoding/content/attribute/etc error within the clean_file function.

score 3 · Accepted Answer

3

次を使用してエラーを処理できます：try-except-finally

于 2012-10-23T13:19:10.997 に答える

score 1 · Accepted Answer

エラー処理clean_fileは、forループ内またはforループ内で実行できます。

for filename in os.listdir(directory):
    try:
        clean_file(filename)
    except:
        print "Error processing file %s" % filename

どの例外が発生するかがわかっている場合は、より具体的なキャッチを使用できます。

python - How to structure Python function so that it continues after error?

2 に答える 2

Related

Reference