python - BeautifulSoup が `html5lib` で html の解析に失敗する

質問する 2015-12-25T13:49:28.040

2540 次

BeautifulSoup はオプションを指定した html ページの解析に失敗しますhtml5libが、オプションを指定すると正常に動作しますhtml.parser。docsによると、html5libはよりも寛大なはずなのにhtml.parser、それを使用して HTML ページを解析するときに厄介なコードに遭遇したのはなぜですか?

以下は小さな実行例です。 ( をで変更した後html5lib、html.parser中国語の出力は正常です。)

#_*_coding:utf-8_*_
import requests
from bs4 import BeautifulSoup

ss = requests.Session()
res = ss.get("http://tech.qq.com/a/20151225/050487.htm")
html = res.content.decode("GBK").encode("utf-8")
soup = BeautifulSoup(html, 'html5lib')
print str(soup)[0:800]  # where you can see if the html is parsed normally or not

python - BeautifulSoup が `html5lib` で html の解析に失敗する

1 に答える 1

Related

Reference