python - メールファインダー（ウェブ）

Question

このアプリは Web サイトで実行され、すべてのメールを検索して返します。

def testEmails(url):
    'Test the emails() function'
    email = ''
    content = urlopen(url).read().decode()
    pattern='[A-Za-z0-9_.]+\@[A-Za-z0-9_.]+\.'
    for attr in content:
        if attr[0] == 'href':
           print(attr)
           email+='{} '.format(attr)
    emails = re.findall(pattern,email)
    return emails

空白の文字列が表示され続けますが、その理由を知っている人はいますか?

編集：

これは私の現在のコードです:

def emails(content):
'return list of email addresses contained in string content'
    email = []
    content = urlopen(url).read().decode()
    pattern='[A-Za-z0-9_.]+\@[A-Za-z0-9_.]+\....'
    email.append(re.findall(pattern,content))
    print(email)

しかし、何らかの理由で私は得る:

[['somePERSON@university.ca"']]

それ以外の：

['somePERSON@university.ca']

score 2 · Accepted Answer

urlopen().read().decode()Unicode 文字列を返します。したがって、それをループすると、個々の文字がループされます。探している HTML 属性ではありません。HTMLParser を使用して属性を抽出するか、ドキュメント全体で re.findall を実行する必要があります (より粗雑ですが、プレーンテキストの電子メールアドレスも抽出します)。

python - メールファインダー（ウェブ）

1 に答える 1

Related

Reference