python - Python を使用して Twitter ページをクロールする方法は?

Question

このコードを使用して Twitter をクロールしようとすると:

import urllib2
s = "https://mobile.twitter.com/bing/"
html = urllib2.urlopen(s).read()
print html

... 次のエラーが表示されます。

Traceback (most recent call last):
  File "C:\Users\arpit\Downloads\Desktop\Wiki Code\final Crawler_wiki.py", line 14, in <module>
    html = urllib2.urlopen(s).read()
  File "C:\Python27\lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = self._open(req, data)
  File "C:\Python27\lib\urllib2.py", line 418, in _open
    '_open', req)
  File "C:\Python27\lib\urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 1215, in https_open
    return self.do_open(httplib.HTTPSConnection, req)
  File "C:\Python27\lib\urllib2.py", line 1177, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>

に置き換えるmobile.twitter.comとtwitter.com動作しますが、で動作させたいですmobile.twitter.com。

score 0 · Accepted Answer

Twitter サイトはおそらく、urllib API を介してリクエストを行うときに設定していないユーザーエージェントを探しています。

ユーザーエージェントを偽造するには、機械化のようなものを使用する必要があるでしょう。

ただし、データを操作するための簡単で素晴らしい方法を提供するTwitter APIを使用することを強くお勧めします。

python - Python を使用して Twitter ページをクロールする方法は?

1 に答える 1

Related

Reference