python - BS4を使用してタグからhref値を抽出できませんでした

Question

score 3 · Accepted Answer

これはトリックをしませんか？

for a in soup.find_all('a', href=True):
    print a['href']

必要に応じて、attrsを使用できますfind_all：

soup.find_all("div", {"style": "display:inline; position:relative;"})

空白を取り除き、リンクを絶対にする：

import urlparse
urlparse.urljoin(url, a['href'].strip())

score 2 · Accepted Answer

for a in soup.find_all('a', {"style": "display:inline; position:relative;"}, href=True):
    href = a['href'].strip()
    href = "http://example.com" + href
print(href)

'http://example.com/aems/file/filegetrevision.do?fileEntityId=8120070&cs=LU31NT9us5P9Pvkb1BrtdwaCrEraskiCJcY6E2ucP5s.xyz'

組み込み関数strip()はここで非常に役立ちます。:)

python - BS4を使用してタグからhref値を抽出できませんでした

2 に答える 2

Related

Reference