web-crawler - Crawler4jオープンソースコードでStatisticsDBは何をしますか?

Question

Crawler4j オープンソースWeb クローラーを理解しようとしています。ところで、私はいくつかの疑問を持っていますが、それは次のとおりです。

質問:-

CountersクラスでStatisticsDBは何をしているのですか。以下のコード部分を説明してください。

 public Counters(Environment env, CrawlConfig config) throws DatabaseException {
    super(config);

    this.env = env;
    this.counterValues = new HashMap<String, Long>();

    /*
     * When crawling is set to be resumable, we have to keep the statistics
     * in a transactional database to make sure they are not lost if crawler
     * is crashed or terminated unexpectedly.
     */
    if (config.isResumableCrawling()) {
        DatabaseConfig dbConfig = new DatabaseConfig();
        dbConfig.setAllowCreate(true);
        dbConfig.setTransactional(true);
        dbConfig.setDeferredWrite(false);
        statisticsDB = env.openDatabase(null, "Statistics", dbConfig);

        OperationStatus result;
        DatabaseEntry key = new DatabaseEntry();
        DatabaseEntry value = new DatabaseEntry();
        Transaction tnx = env.beginTransaction(null, null);
        Cursor cursor = statisticsDB.openCursor(tnx, null);
        result = cursor.getFirst(key, value, null);

        while (result == OperationStatus.SUCCESS) {
            if (value.getData().length > 0) {
                String name = new String(key.getData());
                long counterValue = Util.byteArray2Long(value.getData());
                counterValues.put(name, counterValue);
            }
            result = cursor.getNext(key, value, null);
        }
        cursor.close();
        tnx.commit();
    }
}

私の知る限り、クロールされた URL が保存されるため、クローラーがクラッシュした場合に役立ちます。その後、Web クローラーを最初から開始する必要はありません。 上記のコードを1行ずつ説明してください。

2. Crawlers4j は SleepyCat を使用して中間情報を保存するため、SleepyCat を説明する適切なリンクが見つかりませんでした。ですから、SleepyCat の基本を学べる良いリソースを教えてください。（上記のコードで使用されているトランザクション、カーソルの意味がわかりません）。

お願い助けて。あなたの親切な返事を探しています。

score 1 · Accepted Answer

基本的に、Crawler4j は、DB からすべての値をロードすることにより、データベースから既存の統計をロードします。実際、トランザクションが開かれ、DB に変更が加えられていないため、コードはほとんど正しくありません。したがって、tnx を扱う行は削除できます。

行ごとにコメント:

//Create a database configuration object 
DatabaseConfig dbConfig = new DatabaseConfig();
//Set some parameters : allow creation, set to transactional db and don't use deferred    write
dbConfig.setAllowCreate(true);
dbConfig.setTransactional(true);
dbConfig.setDeferredWrite(false);
//Open the database called "Statistics" with the upon created configuration
statisticsDB = env.openDatabase(null, "Statistics", dbConfig);

 OperationStatus result;
//Create new database entries key and values
    DatabaseEntry key = new DatabaseEntry();
    DatabaseEntry value = new DatabaseEntry();
//Start a transaction
    Transaction tnx = env.beginTransaction(null, null);
//Get the cursor on the DB
    Cursor cursor = statisticsDB.openCursor(tnx, null);
//Position the cursor to the first occurrence of key/value
    result = cursor.getFirst(key, value, null);
//While result is success
    while (result == OperationStatus.SUCCESS) {
//If the value at the current cursor position is not null, get the name and the value of     the counter and add it to the Hashmpa countervalues
        if (value.getData().length > 0) {
            String name = new String(key.getData());
            long counterValue = Util.byteArray2Long(value.getData());
            counterValues.put(name, counterValue);
        }
        result = cursor.getNext(key, value, null);
    }
    cursor.close();
//Commit the transaction, changes will be operated on th DB
    tnx.commit();

こちらでも同様の質問に回答しました。SleepyCatについて、これについて話しているのですか？

web-crawler - Crawler4jオープンソースコードでStatisticsDBは何をしますか?

1 に答える 1

Related

Reference