python - Python Beautiful Soup .content プロパティ

Question

BeautifulSoup の .content は何をしますか? crummy.com のチュートリアルに取り組んでいますが、.content の機能がよくわかりません。フォーラムを見ましたが、回答がありません。以下のコードを見ると……。

from BeautifulSoup import BeautifulSoup
import re



doc = ['<html><head><title>Page title</title></head>',
       '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.',
        '<p id="secondpara" align="blah">This is paragraph <b>two</b>.',
        '</html>']

soup = BeautifulSoup(''.join(doc))
print soup.contents[0].contents[0].contents[0].contents[0].name

コードの最後の行では、代わりに「body」が出力されると思います...

  File "pe_ratio.py", line 29, in <module>
    print soup.contents[0].contents[0].contents[0].contents[0].name
  File "C:\Python27\lib\BeautifulSoup.py", line 473, in __getattr__
    raise AttributeError, "'%s' object has no attribute '%s'" % (self.__class__.__name__, attr)
AttributeError: 'NavigableString' object has no attribute 'name'

.content は、html、head、および title のみに関係していますか? もしそうなら、それはなぜですか？

事前に助けてくれてありがとう。

score 3 · Accepted Answer

タグ内の情報を提供するだけです。例を挙げて説明しましょう:

html_doc = """
<html><head><title>The Dormouse's story</title></head>

<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc)
head = soup.head

print head.contents

上記のコードは、タグ内[<title>The Dormouse's story</title>]にあるため、リストを提供します。したがって、呼び出すと、リストの最初の項目が表示されます。head[0]

エラーが発生する理由は、soup.contents[0].contents[0].contents[0].contents[0]それ以上タグのないものを返すためです (したがって、属性はありません)。コードから返さPage Titleれるのは、最初のコードcontents[0]が HTML タグを提供し、2 番目のコードがタグを提供するためheadです。title3 番目はタグにつながり、4 番目は実際のコンテンツを示します。そのため、を呼び出すと、name提供するタグはありません。

本文を印刷したい場合は、次の操作を実行できます。

soup = BeautifulSoup(''.join(doc))
print soup.body

bodyのみを使用する場合contentsは、次を使用します。

soup = BeautifulSoup(''.join(doc))
print soup.contents[0].contents[1].name

はの後の 2 番目の要素[0]であるため、インデックスとして使用して取得することはできません。bodyhead

python - Python Beautiful Soup .content プロパティ

1 に答える 1

Related

Reference