python - すべてを見つける方法
は特定の範囲内にある
クラス？

Question

環境：

美しいスープ 4

パイソン 2.7.5

論理：

のクラス<li>内にある「find_all」インスタンス:<ul>my_class

<ul class='my_class'>
<li>thing one</li>
<li>thing two</li>
</ul>

明確化:<li>タグ間の「テキスト」を取得するだけです。

Python コード:

（以下のfind_allは正しくありません。コンテキストに入れているだけです）

from bs4 import BeautifulSoup, Comment
import re

# open original file
fo = open('file.php', 'r')
# convert to string
fo_string = fo.read()
# close original file
fo.close()
# create beautiful soup object from fo_string
bs_fo_string = BeautifulSoup(fo_string, "lxml")
# get rid of html comments
my_comments = bs_fo_string.findAll(text=lambda text:isinstance(text, Comment))
[my_comment.extract() for my_comment in my_comments]

my_li_list = bs_fo_string.find_all('ul', 'my_class')

print my_li_list

score 18 · Accepted Answer

これ？

>>> html = """<ul class='my_class'>
... <li>thing one</li>
... <li>thing two</li>
... </ul>"""
>>> from bs4 import BeautifulSoup as BS
>>> soup = BS(html)
>>> for ultag in soup.find_all('ul', {'class': 'my_class'}):
...     for litag in ultag.find_all('li'):
...             print litag.text
... 
thing one
thing two

説明：

soup.find_all('ul', {'class': 'my_class'})ulのクラスを持つすべてのタグを検索しますmy_class。

li次に、それらのタグ内のすべてのタグを見つけて、タグulの内容を出力します。

score 2 · Accepted Answer

これは BeautifulSoup3 でトリックを行います。このマシンには 4 はありません。

>>> [li.string for li in bs_fo_string.find('ul', {'class': 'my_class'}).findAll('li')]
[u'thing one', u'thing two']

アイデアは、最初に「my_class」クラスで ul を検索し、次にその ul 内のすべての li を検索することです。

同じクラスの追加の ul がある場合は、ul 検索でも findAll を使用し、リスト内包表記をネストされるように変更することができます。

python - すべてを見つける方法は特定の範囲内にあるクラス？

2 に答える 2

説明：

Related

Reference

python - すべてを見つける方法
は特定の範囲内にある
クラス？