python - 内部からリンクを抽出する方法
:美しいスープ

次のように書かれたリンクを抽出しようとしています。

`<h2 class="section-heading"> <a href="http://www.nytimes.com/pages/arts/index.html">Arts »</a> </h2> <`

Question

次のように書かれたリンクを抽出しようとしています。

<h2 class="section-heading">
    <a href="http://www.nytimes.com/pages/arts/index.html">Arts »</a>
</h2>

私のコードは次のとおりです。

from bs4 import BeautifulSoup
import requests, re

def get_data():
    url='http://www.nytimes.com/'
    s_code=requests.get(url)
    plain_text = s_code.text
    soup = BeautifulSoup(plain_text)
    head_links=soup.findAll('h2', {'class':'section-heading'})

    for n in head_links :
       a = n.find('a')
       print a
       print n.get['href'] 
       #print a['href']
       #print n.get('href')
       #headings=n.text
       #links = n.get('href')
       #print headings, links

get_data()

「print a」のようなものは、単にie<a>内の行全体を出力します。<h2 class=section-heading>

<a href="http://www.nytimes.com/pages/world/index.html">World »</a>

しかし、「print n.get ['href']」を実行すると、エラーが発生します。

print n.get['href'] 
TypeError: 'instancemethod' object has no attribute '__getitem__'

ここで何か間違ったことをしていますか？助けてください

ここで同様のケースの質問が見つかりませんでした。私の問題はここでは少し独特です。特定のクラス名のセクション見出し内にあるリンクを抽出しようとしています。

score 3 · Accepted Answer

まずhref、要素のをフェッチする必要があるため、その行ではなくaアクセスする必要があります。第二に、それは次のいずれかでなければなりませんan

a.get('href')

また

a['href']

後者の形式は、そのような属性が見つからない場合にスローしますが、前者はNone、通常のディクショナリ/マッピングインターフェイスのようにを返します。メソッドと同様に、( ).getと呼ぶ必要があります。.get(...)インデックス作成/要素へのアクセスは機能しません ( .get[...])。これがこの質問の内容です。

findここで失敗する可能性があり、が返される可能性があることに注意してNoneくださいn.find_all('a', href=True)。

for n in head_links:
   for a in n.find_all('a', href=True):
       print(a['href'])

使用するよりもさらに簡単なのは、CSS セレクターを取るメソッドをfind_all使用することです。selectここでは、単一の操作で、JQuery と同じくらい簡単に内部にある属性を持つ<a>要素のみを取得します。href<h2 class="section-heading">

soup = BeautifulSoup(plain_text)
for a in soup.select('h2.section-heading a[href]'):
    print(a['href'])

(また、作成する新しいコードでは小文字のメソッド名を使用してください)。

python - 内部からリンクを抽出する方法:美しいスープ 次のように書かれたリンクを抽出しようとしています。

:美しいスープ

1 に答える 1

Related

Reference

python - 内部からリンクを抽出する方法
:美しいスープ

次のように書かれたリンクを抽出しようとしています。

`<h2 class="section-heading"> <a href="http://www.nytimes.com/pages/arts/index.html">Arts »</a> </h2> <`