python - .next_elementと.previous_elementBeautifulSoup4の概念の競合

Question

私はちょうどB4ドキュメントを調べて、のに関するいくつかの概念を取得しGoing back and forthましたhtml family tree。

last_a_tag = soup.find("a", id="link3")
last_a_tag
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
last_a_tag.next_element
# u'Tillie'  
last_a_tag.previous_element
# u' and\n' ## upto this is Good to understand!
last_a_tag.previous_element.next_element
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

ここで矛盾が頭に浮かびます。.Previous_elementコンセプトによると、last_a_tag.previous_element.next_elementt与えるべき<a class="sister" href="http://example.com/tillie" id="link3">ですが、なぜ上記のように完全なものを与えるのですか？

編集

last_a_tag.previous_element
# u' and\n'  <~~Perfect
last_a_tag.previous_element.next_element
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

以下までやってみませんか？

#<a class="sister" href="http://example.com/tillie" id="link3">

下の部分までどうですか？ Tillie</a> <~~ここに混乱があります

私が理解するのを手伝ってください。

score 2 · Accepted Answer

あなたはまだタグへの参照を見ています、そしてそれが印刷されるとき、それが含まれているすべての子も印刷されます。

タグは、開始<a ...>要素だけでなく、子と終了要素も含みます。たとえば、.next_element（）を介して、ツリー内のそれらの子に到達する必要があります。u'Tillie'

ツリー内を移動すると、テキストの開始部分と終了部分の間を移動するのではなく、ツリー内の要素間を移動します。元のXML/HTMLドキュメントでは、これらの要素を特定の順序で定義していますが、ここで見ているのはそれではありません。タグのネストされた構造と、ルートに至るまで他のタグ内に収まるテキストを見ています。

したがって、次のHTML構造：

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

次の線に沿った構造になります。

p
\
  a
  \
    "Elsie"
  ", "
  a
  \
    "Lacie"
  " and "
  a
  \
    "Tillie"
  "; and they lived at the bottom of a well."

（多くの空白を削除するために簡略化されています）。

最後の要素への参照がある場合a、そのセットの前の要素はテキスト" and "であり、次は"Tillie"です。後"Tillie"にテキストが来ます"; and they lived at the bottom of a well."。テキストの前にテキストなど" and "があります。"Lacie"

python - .next_elementと.previous_elementBeautifulSoup4の概念の競合

1 に答える 1

Related

Reference