robots.txt - Googlebot が Robots.txt を尊重しない

Question

何らかの理由で、Google Webmaster Tool の「Analyze robots.txt」をチェックして、どの URL が robots.txt ファイルによってブロックされているかを確認すると、期待どおりではありません。ファイルの先頭からのスニペットを次に示します。

Sitemap: http://[omitted]/sitemap_index.xml

User-agent: Mediapartners-Google
Disallow: /scripts

User-agent: *
Disallow: /scripts
# list of articles given by the Content group
Disallow: http://[omitted]/Living/books/book-review-not-stupid.aspx
Disallow: http://[omitted]/Living/books/book-review-running-through-roadblocks-inspirational-stories-of-twenty-courageous-athletic-warriors.aspx
Disallow: http://[omitted]/Living/sportsandrecreation/book-review-running-through-roadblocks-inspirational-stories-of-twenty-courageous-athletic-warriors.aspx

Googlebot と Mediapartners-Google の両方で、scripts フォルダー内のすべてが正しくブロックされます。Mediapartners-Google は 4 行目からブロックされているのに対し、Googlebot はスクリプトが 7 行目からブロックされていることを示しているため、2 つのロボットが正しいディレクティブを認識していることがわかります。 -agent ディレクティブはブロックされません!

私のコメントや絶対URLの使用が問題を引き起こしているのではないかと思っています...

どんな洞察も高く評価されます。ありがとう。

score 11 · Accepted Answer

それらが無視される理由は、仕様では許可されていないにもかかわらず、エントリrobots.txt用のファイルに完全修飾 URL があるためです。(相対パス、または / を使用した絶対パスのみを指定する必要があります)。次のことを試してください。Disallow

Sitemap: /sitemap_index.xml

User-agent: Mediapartners-Google
Disallow: /scripts

User-agent: *
Disallow: /scripts
# list of articles given by the Content group
Disallow: /Living/books/book-review-not-stupid.aspx
Disallow: /Living/books/book-review-running-through-roadblocks-inspirational-stories-of-twenty-courageous-athletic-warriors.aspx
Disallow: /Living/sportsandrecreation/book-review-running-through-roadblocks-inspirational-stories-of-twenty-courageous-athletic-warriors.aspx

キャッシュに関しては、Google は平均 24 時間ごとに robots.txt ファイルのコピーを取得しようとします。

score 2 · Accepted Answer

それは絶対URLです。robots.txt には、相対 URI のみを含める必要があります。ドメインは、robots.txt がアクセスされたドメインに基づいて推測されます。

score 0 · Accepted Answer

少なくとも 1 週間は稼働しており、Google によると、最後にダウンロードされたのは 3 時間前なので、最近だと確信しています。

score -1 · Accepted Answer

最近、robots.txt ファイルにこの変更を加えましたか? 私の経験では、Googleはそのようなものを非常に長い間キャッシュしているようです。

robots.txt - Googlebot が Robots.txt を尊重しない

4 に答える 4

Related

Reference