0

C#でクローラーを作成したい。問題は、一部のWebサイトがrobots.txtファイルでブラックリストに登録されたクローラーを無効にしていることです。

User-agent: *
Disallow: /

たとえばGooglebotであることを示すためにリクエストを偽造する方法はありますか?

4

2 に答える 2

3

HttpWebRequest has .UserAgent, however - I would simply say: don't.

Of course, your point re robots.txt is rather moot; that is for you to follow. If you write a badly behaved tool that ignores robots.txt regardless of what you claim as your user-agent, then you should expect to be blacklisted fairly quickly.

In particular, trying to impersonate any of the major players is very dubious. Frankly I'd expect most major sites to also check the incoming IP range.

于 2012-04-09T10:34:19.617 に答える
0

Yes, the HttpWebRequest has a property for user agent. You can set that to anything.

于 2012-04-09T10:34:11.447 に答える