これは自己回答型の質問です。入力、コメント、コード レビューを評価します。うまくいくと思いますが、よくわかりません。私のテストでは有効な結果が得られたように見えますが、電子メールは巧妙で複雑な獣です。私の論理が正しいかどうかはわかりません。テストしたい場合は、生の電子メールファイルを保存し、ファイル名をコードに入れます。場所は明らかです。これを行うより良い方法はありますか?もしそうなら、私はそれを聞きたいです。
Python 2.7 コード。
import email
filename = 'xxx.eml'
with open(filename, 'rb') as f:
msg = email.message_from_file(f)
# count number of attachments in an email
# this determines the 'real' attachments, ie those that a user might have attached to the email
# it does not include the attachments that make up the message content
totalattachments = 0
firsttextattachmentseen = False
lastseenboundary = ''
# .walk steps through all the parts of an email including boundaries and attachments
for part in msg.walk():
if part.is_multipart():
# this is a boundary, not an attachment, so we record it as the last seen boundary and continue to next part
lastseenboundary = part.get_content_type()
continue
if lastseenboundary == 'multipart/alternative':
#for HTML emails, the multipart/alternative part contains the HTML and its alternative
#text representation, so we skip anything within the multipart/alternative boundary
continue
if part.get_content_type() == 'text/plain':
#if this is a plain text email, then the first txt attachment is the message body so we do not
#count it as an attachment
if firsttextattachmentseen == False:
firsttextattachmentseen = True
continue
else:
totalattachments += 1
continue
# any other part we encounter we shall assume is a user added attachment
totalattachments += 1
print(totalattachments, ': ', filename)