urllib2
ページから情報を取得するためのコードをこのコードに置き換えようとしてrequests
います。ライブラリを移動する方法については、100% 確信が持てません。これは私がこれまでに持っているもので、エラーがあります。何が間違っていますか?
コード:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import requests, sys
from lxml import etree
# import urllib2
# UTF8
reload(sys)
sys.setdefaultencoding("utf-8")
# url = 'http://countrycode.org/Germany'
# opener = urllib2.build_opener()
# opener.addheaders = [('User-agent', 'USERAGENT')]
r = requests.get('http://countrycode.org/Germany')
response = r.text
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
countryCodeXpath = '//*[@id="main_table_blue_2"]/tr[3]/td[2]'
countryCode = tree.xpath(countryCodeXpath)
destCountryCode = countryCode[0].text
print destCountryCode
エラー:
Traceback (most recent call last):
File "/home/ubuntu/test.py", line 16, in <module>
tree = etree.parse(response, htmlparser)
File "lxml.etree.pyx", line 3196, in lxml.etree.parse (src/lxml/lxml.etree.c:64039)
File "parser.pxi", line 1549, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:91262)
File "parser.pxi", line 1578, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:91546)
File "parser.pxi", line 1478, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:90613)
File "parser.pxi", line 1025, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:87527)
File "parser.pxi", line 565, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:83101)
File "parser.pxi", line 656, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:84083)
File "parser.pxi", line 594, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:83379)
IOError: Error reading file '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<SNIP>