Trying to spider/crawl through a third-party website, but I seem to have hit a snag:
urlopen'ing a site gets a response, but reading and printing the HTML seems to tell me that I'm getting nothing back. Could this be due to some kind of blocking on the other end? or anything?
currently, I'm trying to open New York Times articles. The main pages return HTML, the articles, uh, don't.
try:
source = urllib.urlopen(target_site)
html = source.read()
print "HTML: ", html.lower()
output:
HTML:
(other stuff)
Oh, and it also times out once in a while, but that's a different story, I'm hoping.