2

これは自己回答型の質問です。入力、コメント、コード レビューを評価します。うまくいくと思いますが、よくわかりません。私のテストでは有効な結果が得られたように見えますが、電子メールは巧妙で複雑な獣です。私の論理が正しいかどうかはわかりません。テストしたい場合は、生の電子メールファイルを保存し、ファイル名をコードに入れます。場所は明らかです。これを行うより良い方法はありますか?もしそうなら、私はそれを聞きたいです。

Python 2.7 コード。

import email

filename = 'xxx.eml'

with open(filename, 'rb') as f:
    msg = email.message_from_file(f)

    # count number of attachments in an email
    # this determines the 'real' attachments, ie those that a user might have attached to the email
    # it does not include the attachments that make up the message content
    totalattachments = 0
    firsttextattachmentseen = False
    lastseenboundary = ''
    # .walk steps through all the parts of an email including boundaries and attachments
    for part in msg.walk():
        if part.is_multipart():
            # this is a boundary, not an attachment, so we record it as the last seen boundary and continue to next part
            lastseenboundary = part.get_content_type()
            continue
        if lastseenboundary == 'multipart/alternative':
            #for HTML emails, the multipart/alternative part contains the HTML and its alternative 
            #text representation, so we skip anything within the multipart/alternative boundary
            continue
        if part.get_content_type() == 'text/plain':
            #if this is a plain text email, then the first txt attachment is the message body so we do not 
            #count it as an attachment
            if firsttextattachmentseen == False:
                firsttextattachmentseen = True
                continue
            else:
                totalattachments += 1
                continue
        # any other part we encounter we shall assume is a user added attachment
        totalattachments += 1

    print(totalattachments, ': ', filename)
4

1 に答える 1