google-app-engine - 毎日更新されるページをダウンロードする際の問題

Question

Web ページを取得してリンクを検索するアプリケーションを GAE で開発しています。
このページは毎朝更新されるため、cron ジョブが毎朝 15 分ごとに数時間実行され、当日のページが取得されます。

ここに問題があります。アプリケーションが cron ジョブの最初の実行時に古いページ (昨日のページ) を見つけた場合、新しいページが同じ URL で利用可能であるにもかかわらず、アプリケーションはそのページをフェッチし続けます。
どこかでキャッシュが使われているようですが、無効にできません。

アプリケーションがページをダウンロードするために使用するコードは、単純に Java I/O です。

    InputStream input = null;
    ByteArrayOutputStream output = null;
    HttpURLConnection conn = null;
    URL url = new URL("http://www.page.url.net");
    try {
        conn = (HttpURLConnection) url.openConnection();
        conn.setReadTimeout(0);
        conn.setUseCaches(false);
        int httpResponseCode = conn.getResponseCode();
        if (httpResponseCode == HttpURLConnection.HTTP_OK) {
            input = conn.getInputStream();
            output = writeByteArrayOutputStreamFromInputStream(input);
        } else {
            throw new IOException("response code " + httpResponseCode);
        }
    } finally {
        if (input != null) {
            output.close();
            conn.disconnect();
        }
    }

どうしたの？

google-app-engine - 毎日更新されるページをダウンロードする際の問題

1 に答える 1

Related

Reference