python - lxml を使用した HTML の解析 (python)

翻译自：https://stackoverflow.com/questions/18433506 2013-08-25T20:43:31.790

1410 次

HTML ページのコンテンツを .html ファイルに保存しようとしていますが、コンテンツを「テーブル」タグの下に保存したいだけです。さらに、のような空のタグをすべて削除したいと思います<b></b>。これらのことはすべて BeautifulSoup ですでに行っています。

f = urllib2.urlopen('http://test.xyz')
html = f.read()
f.close()
soup = BeautifulSoup(html)

txt = ""

for text in soup.find_all("table", {'class': 'main'}):
txt += str(text)

text = BeautifulSoup(text)
empty_tags = text.find_all(lambda tag: tag.name == 'b' and tag.find(True) is None and (tag.string is None or tag.string.strip()=="")) 
[empty_tag.extract() for empty_tag in empty_tags]

私の質問は次のとおりです:これはlxmlでも可能ですか? はいの場合: この +/- はどのように見えますか? 助けてくれてありがとう。

python - lxml を使用した HTML の解析 (python)

1 に答える 1

Related

Reference