web-crawler - robots.txt Disallow: /click What is disallowed?

Question

I would like to scrape a web site. It has the following in it's robots.txtfile, but I'm not exactly sure what it is they don't want me to do:

User-agent: *
Disallow: /click

There is no click subdirectory. Or they don't want me to access anything that would normally require clicking (like submitting data via a form)? They sure aren't making it easy in any case - the main page's form GETS to a site that sets a cookie that is read by a third page.

score 2 · Accepted Answer

It means that no bot should crawl any URLs whose paths start with the string click.

For example, the following URLs should be blocked:

example.com/click
example.com/click.html
example.com/click/
example.com/click/foo/bar
example.com/clicker

The following URLs would still be allowed:

example.com/foo/click
example.com/fooclick
example.com/clic

You can find the original robots.txt specification at http://www.robotstxt.org/wc/robots.html.

web-crawler - robots.txt Disallow: /click What is disallowed?

1 に答える 1

Related

Reference