python - pythonとbeautifulsoupでhtmlテーブルを解析してcsvに書き込む方法

Question

HTMLページを解析して通貨の値を取得し、csvに書き込もうとしています。私は次のコードを持っています:

#!/usr/bin/env python

import urllib2
from BeautifulSoup import BeautifulSoup

contenturl = "http://www.bank.gov.ua/control/en/curmetal/detail/currency?period=daily"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())

table = soup.find('div', attrs={'class': 'content'})

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True) + ';'
        print text,
    print

問題は、私にはわかりませんが、通貨の値のみを取得する方法です。'^[0-9]{3}' のような正規表現を試してみました - 3 桁から始めましたが、うまくいきません。

score 9 · Accepted Answer

テーブル内の特定のセルを選択する方がはるかに優れています。クラスのtdセルにcell_cは関心のあるデータが含まれており、最後のセルは常に為替レートです。

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    if 'cell_c' in cols[0]['class']:
        # currency row
        digital_code, letter_code, units, name, rate = [c.text for c in cols]
        print digital_code, letter_code, units, name, rate

データを別々の変数に入れることで、テキストを 10 進数に変換したり、データベースに保存したりできます。

python - pythonとbeautifulsoupでhtmlテーブルを解析してcsvに書き込む方法

1 に答える 1

Related

Reference