html - BeautifulSoup を使用して HTML からテキストを取得する

Question

Python2.7 と BeautifulSoup4 を使用して、電力会社のWeb サイトから現在の「5 分間のトレンド価格」を取得しようとしています。

xpath は次のとおりです。xpath = "//html/body/div[2]/div/div/div[3]/p[1]"

また

<div class="instant prices">
  <p class="price">
    "5.2"  # this is what I'm ultimately after
    <small>¢</small>
    <strong> per kWh </strong>
  </p>

「5.2」値を取得するために無数の方法を試しましたが、「即時価格」オブジェクトにドリルダウンすることに成功しましたが、そこから何も取得できません。

私の現在のコードは次のようになります: import urllib2 from bs4 import BeautifulSoup

url = "https://rrtp.comed.com/live-prices/"

soup = BeautifulSoup(urllib2.urlopen(url).read())
#print soup

instantPrices = soup.findAll('div', 'instant prices')
print instantPrices

...出力は次のとおりです。

[<div class="instant prices">
</div>]
[]

とにかく、Chrome で要素を調べたときにはっきりと見えるにもかかわらず、「インスタント価格」オブジェクトが空であるように見えます。どんな助けでも大歓迎です！ありがとうございました！

score 2 · Accepted Answer

残念ながら、このデータは、ブラウザが Web サイトをレンダリングするときに Javascript を介して生成されます。そのため、urllib を使用してソースをダウンロードすると、この情報が表示されません。できることは、バックエンドに直接クエリを実行することです:

>>> import urllib2
>>> import re

>>> url = "https://rrtp.comed.com/rrtp/ServletFeed?type=instant"
>>> s = urllib2.urlopen(url).read()
"<p class='price'>4.5<small>&cent;</small><strong> per kWh </strong></p><p>5-minute Trend Price 7:40 PM&nbsp;CT</p>\r\n"

>>> float(re.findall("\d+.\d+", s)[0])
4.5

html - BeautifulSoup を使用して HTML からテキストを取得する

1 に答える 1

Related

Reference