python - HTMLとスープの結果は大きく異なります

Question

Beautifulsoupに問題があります。

あなたはここでhtmlを見つけることができます-> http://pastebin.com/Nr1k0dcM

その後、私は単に実行しますsoup = BeautifulSoup(html) print soup.prettify()

HTMLからの結果に違いはないはずですが、私はこれしか得られません> http://pastebin.com/Y6DmEj40

私は本当にここで何が起こっているのか理解していません...

編集：

これは、私が廃棄しているURLの1つです。たとえば、http：//fantasy.premierleague.com/entry/38861/event-history/8/

次のエラーが発生するため、HTMLをからに破棄しているだけです。

HTMLParser.HTMLParseError: bad end tag: u"</scri'+'pt>", at line 89, column 222

だから私が今していることは次のとおりです

response = requests.get(url, headers=headers)
html = response.text
tablestart = html.find('<!-- pitch view -->') + 19
tableend = html.find('<!-- end ismPitch -->')
html = html[tablestart:tableend]
soup = BeautifulSoup(html)

score 1 · Accepted Answer

上記のコードをこのように実装します

import urllib2
from bs4 import BeautifulSoup
response = urllib2.urlopen("http://fantasy.premierleague.com/entry/38861/event-history/8/")
html = response.read()
tablestart = html.find('<!-- pitch view -->') + 19
print tablestart
tableend = html.find('<!-- end ismPitch -->')
print tableend
html = html[tablestart:tableend]
soup = BeautifulSoup(html)

上記のコードの出力は

55594
92366

python - HTMLとスープの結果は大きく異なります

1 に答える 1

Related

Reference