python - 美しいスープの HTML 抽出

Question

必要なデータを取得するのに苦労しています。BS の使用方法を知っていれば、非常に簡単だと確信しています。ドキュメントを読んだ後、何時間も役に立たずにこれを正しくしようとしました。

現在、私のコードはこれをpythonで出力します：

[<td>0.32%</td>, <td><span class="neg color ">&gt;-0.01</span></td>, <td>0.29%</td>, <td>0.38%</td>, <td><span class="neu">0.00</span></td>]

タグを含まない td タグのコンテンツを分離するにはどうすればよいですか?

つまり、0.32%、0.29%、0.38% のみを表示したいと考えています。

ありがとうございました。

import urllib2
from bs4 import BeautifulSoup

fturl = 'http://markets.ft.com/research/Markets/Bonds'
ftcontent = urllib2.urlopen(fturl).read()
soup = BeautifulSoup(ftcontent)

ftdata = soup.find(name="div", attrs={'class':'wsodModuleContent'}).find_all(name="td",       attrs={'class':''})

score 2 · Accepted Answer

これはあなたにとって良い解決策ですか：

html_txt = """<td>0.32%</td>, <td><span class="neg color">
    &gt;-0.01</span></td>, <td>0.29%</td>, <td>0.38%</td>, 
    <td><span class="neu">0.00</span></td>
    """
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_txt)
print [tag.text for tag in soup.find_all('td') if tag.text.strip().endswith("%")]

出力は次のとおりです。

[u'0.32%', u'0.29%', u'0.38%']

python - 美しいスープの HTML 抽出

1 に答える 1

Related

Reference