python - Python属性「コンテンツ」で美しいスープを使用する

Question

bool "Hello! Python" から次のコードを使用しています。

import urllib2
from bs4 import BeautifulSoup
import os

def get_stock_html(ticker_name):
    opener = urllib2.build_opener(urllib2.HTTPRedirectHandler(),urllib2.HTTPHandler(debuglevel=0),)
    opener.addhaders = [('User-agent', "Mozilla/4.0 (compatible; MSIE 7.0; " "Windows NT 5.1; .NET CLR 2.0.50727; " ".NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)")]
    url = "http://finance.yahoo.com/q?s=" + ticker_name
    response = opener.open(url)
    return ''.join(response.readlines())

def find_quote_section(html):
    soup = BeautifulSoup(html)
    # quote = soup.find('div', attrs={'class': 'yfi_rt_quote_summary_rt_top'})
    quote = soup.find('div', attrs={'class': 'yfi_quote_summary'})
    return quote

def parse_stock_html(html, ticker_name):
    quote = find_quote_section(html)
    result = {}
    tick = ticker_name.lower()

    result['stock_name'] = quote.find('h2').contents[0]

if __name__ == '__main__':
    os.system("clear")
    html = get_stock_html('GOOG')
    # print find_quote_section(html)
    print parse_stock_html(html, 'GOOG')

次のエラーが発生します。

Traceback (most recent call last):
  File "dwlod.py", line 33, in <module>
    print parse_stock_html(html, 'GOOG')
  File "dwlod.py", line 25, in parse_stock_html
    result['stock_name'] = quote.find('h2').contents[0]
AttributeError: 'NoneType' object has no attribute 'contents'

私は初心者で、どうすればいいのか本当にわかりません。その本は間違っていますか？

追加した

私はちょうど置き換えresult['stock_name'] = quote.find('h2').contents[0]ました：

x = BeautifulSoup(html).find('h2').contents[0]
return x

現在、何も返されませんが、エラーは発生しなくなりました。では、元の python 構文に何か問題がありますか?

score 2 · Accepted Answer

h2Yahoo ファイナンスはしばらくレイアウトを実際に変更していませんが、本がリリースされてから少し微調整したようです。株式記号を含む情報など、必要な情報yfi_rt_quote_summaryは、コンテナが配置されている中にあります。のトップyfi_quote_summary

def find_quote_section(html):
    soup = BeautifulSoup(html)        
    quote = soup.find('div', attrs={'class': 'yfi_rt_quote_summary'})
    return quote

resultまた、何かを印刷したい場合は、返す必要があることに注意してください。いずれかNoneが返されます。

def parse_stock_html(html, ticker_name):
    quote = find_quote_section(html)
    result = {}
    tick = ticker_name.lower()
    result['stock_name'] = quote.find('h2').contents[0]
    return result

>>> print parse_stock_html(html, 'GOOG')
{'stock_name': u'Google Inc. (GOOG)'}
>>>

findところで、単に最初の一致を見つけることに注意してください。

>>> help(BeautifulSoup(html).find)
find(self, name=None, attrs={}, recursive=True, text=None, **kwargs) method of BeautifulSoup.BeautifulSoup instance
    Return only the first child of this Tag matching the given
    criteria.

これは空のようですが、すべての一致を返すものBeautifulSoupもあります。findall

>>> BeautifulSoup(html).findAll('h2')[3].contents[0]
u'Google Inc. (GOOG)'

探しているのは 4 番目の値のようです...それでも、あなたがこれを行っていないことは確かですが、毎回ドキュメント全体を解析しないでください。これは非常にコストがかかる可能性があります。

python - Python属性「コンテンツ」で美しいスープを使用する

1 に答える 1

Related

Reference