coldfusion - 特定のページの例外について robots.txt を検索する ColdFusion

Question

ユーザーがページを作成するときに、そのページの検索エンジンのインデックス作成を許可/禁止するオプションを選択できる機能を CMS に追加しています。

はいを選択した場合、次のようなことが当てはまります。

<cfif request.variables.indexable eq 0>
<cffile 
    action = "append"
    file = "C:\websites\robots.txt"
    output = "Disallow: /blocked-page.cfm"
    addNewLine = "yes">
<cfelse>
<!-- check if page already disallowed in robots.txt and remove line if it does --->
</cfif>

それは<cfelse>私が助けを必要とする条項です。robots.txt を解析して、このページが既に許可されていないかどうかを確認するにはどうすればよいですか? それは cffile action="read" で、read 変数に対して find() を実行しますか?

実際には、ページがすでに許可されていないかどうかのチェックは、二重追加を避けるために、おそらくさらに上に行くでしょう。

score 4 · Accepted Answer

データベースにページのリストを保持していて、各ページレコードにはindexableビットがありますよね? はいの場合、より簡単で信頼性の高い方法は、ページが追加/削除/インデックス可能なビットが変更されるたびに新しい robots.txt を生成することです。

<!--- TODO: query for indexable pages ---->

<!--- lock the code to prevent concurrent changes --->

<cflock name="robots.txt" type="exclusive" timeout="30">

    <!--- flush the file, or simply start with writing something --->

    <cffile 
        action = "write"
        file = "C:\websites\robots.txt"
        output = "Sitemap: http://www.mywebsite.tld/sitemap.xml"
        addNewLine = "yes">

    <!--- append indexable entry to the file --->

    <cfloop query="getPages">

        <!--- we assume that page names are not entered by user (= safe names) --->

        <cffile 
            action = "append"
            file = "C:\websites\robots.txt"
            output = "Disallow: /#getPages.name#.cfm"
            addNewLine = "yes">

    </cfloop>

</cflock>

サンプルコードはテストされていません。タイプミス/バグに注意してください。

score 2 · Accepted Answer

この目的でRobots.txtファイルを使用することはお勧めできません。 Robots.txtはセキュリティ対策ではなく、インデックスに登録したくないページのリストを「悪意のある人」に渡しています。

robotsメタタグを使用する方がはるかに優れています。これにより、インデックスを作成したくないページのリストが誰にも提供されず、ロボットが実行できる個々のアクションをより細かく制御できます。

メタタグを使用すると、通常どおりページを生成するときにタグを出力するだけです。

score 1 · Accepted Answer

<!--- dummy page to block --->
<cfset request.pageToBlock = "/blocked-page.cfm" />

<!--- read in current robots.txt --->
<cffile action="read" file="#expandPath('robots.txt')#" variable="data" />
<!--- build a struct of all blocked pages --->
<cfset pages = {} />
<cfloop list="#data#" delimiters="#chr(10)#" index="i">
    <cfset pages[listLast(i,' ')] = '' />
</cfloop>


<cfif request.variables.indexable eq 0>
    <!--- If the page is not yet blocked add it --->
    <cfif not structKeyExists(pages,pageToBlock)>
        <cffile action="append" file="C:\websites\robots.txt" 
             output="Disallow: #request.pageToBLock#" addNewLine="yes" />
        <!--- not sure if this is in a loop but if it is add it to the struct for nex iteration --->
        <cfset pages[request.pageToBlock] = '' />
    </cfif>
</cfif>

これでうまくいくはずです。ファイルを読み込んでループし、ブロックされたページの構造体を構築します。まだブロックされていない場合にのみ、新しいページを追加してください。

coldfusion - 特定のページの例外について robots.txt を検索する ColdFusion

3 に答える 3

Related

Reference