python - Beautifulsoup スープ 4 がすべての html を読み取れません

翻译自：https://stackoverflow.com/questions/18608990 2013-09-04T08:32:13.147

45 次

取得する Web ページがあります。urllib で取得してコンテンツを印刷すると、実際のコンテンツの長さが表示されます。 htmlをbeautifulsoupで解析すると、実際のコンテンツが表示され、divが含まれています。どこが間違っているのかわかりませんが、bs4が必要なdivの一部を削除するだけです。どうすればよいですかこの問題を解決しますか?, これが私のサンプルです,

#This one does not remove some neccessary parts, This is okay

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib.urlopen("http://example").read())


#But this one removes some neccessary parts, This is not okay

from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib.urlopen("http://example").read())

ありがとうございました

python - Beautifulsoup スープ 4 がすべての html を読み取れません

0 に答える 0

Related

Reference