nested - BeautifulSoup ネストされた / 埋め込まれた、異なるクラスの同一のタグ

翻译自：https://stackoverflow.com/questions/12325084 2012-09-07T20:40:38.073

140 次

次のようなスニペットが頻繁に含まれるいくつかの html ファイルを解析しようとしています。

<p class="p5"><p class="s2">In the directory </p>/home/blah/<p class="s2"> there is a file, </p>plotData.dat</p>

次のように解析されます。

>>> [c for c in P.body.children]
[<p class="p5"></p>,
 <p class="s2">In the directory </p>,
 u'/home/blah/',
 <p class="s2"> there is a file, </p>,
u'plotData.dat']

として出てくると思っていた

>>> [c for c in P.body.children]
[<p class="p5"></p>,
 <p class="s2">In the directory </p>,
 <p class="p5">u'/home/blah/'</p>,
 <p class="s2"> there is a file, </p>,
 <p class="p5">u'plotData.dat'</a>]

入力htmlの形式が間違っているだけですか？入力htmlを後者として解析するためにできることはありますか? (HTML の外観を制御することはできません)

編集: 完全な MWE:

>>> from bs4 import BeautifulSoup as BS
>>> P = BS('<p class="p5"><p class="s2">In the directory </p>/home/blah/<p class="s2"> there is a file, </p>plotData.dat</p>')
>>> [c for c in P.body.children]
[<p class="p5"></p>, <p class="s2">In the directory </p>, u'/home/blah/', <p class="s2"> there is a file, </p>, u'plotData.dat']

nested - BeautifulSoup ネストされた / 埋め込まれた、異なるクラスの同一のタグ

0 に答える 0

Related

Reference