java - 日付に NumericRangeQuery を使用すると、Lucene が年をスキップする

Question

日付範囲 20000101 ～ 20070531 の Lucene クエリを実行していますが、Lucene は、publicationDate が 20000101 ～ 20000701 と 20070101 ～ 20070531 の間のドキュメントのみを返します。Lucene は数年スキップします。異なる日付セットを実行すると、結果は似ています。

完全な挿入コード:

Document doc = new Document();
doc.add(new Field("pageNumber", article.getPageNumber(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new NumericField("publicationDate", 8, Field.Store.YES, true).setIntValue(Integer.parseInt(article.getPublicationDate())));
doc.add(new Field("headline", article.getHeadline(), Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("text", article.getText(), Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("fileName", article.getFileName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("mediaType", article.getMediaType(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("mediaSource", article.getMediaSource(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("overLap", article.getMediaType(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("status", article.getMediaType(), Field.Store.YES, Field.Index.NOT_ANALYZED));
indexWriter.addDocument(doc);

ドキュメント数コード:

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
    Directory index = new SimpleFSDirectory(new File(LUCENE_INDEX_DIRECTORY));
    IndexReader reader = IndexReader.open(index);

    Query sourceQuery = new TermQuery(new Term("mediaSource", source));
    QueryParser queryParser = new QueryParser(Version.LUCENE_36, "text", analyzer);
    Query textQuery = queryParser.parse(terms);
    Query dateRangeQuery = NumericRangeQuery.newIntRange("publicationDate", startDate, endDate, true, true);

    BooleanQuery booleanQuery = new BooleanQuery();
    booleanQuery.add(sourceQuery, BooleanClause.Occur.MUST);
    booleanQuery.add(textQuery, BooleanClause.Occur.MUST);
    booleanQuery.add(dateRangeQuery, BooleanClause.Occur.MUST);

    IndexSearcher searcher = new IndexSearcher(reader);

    TotalHitCountCollector collector = new TotalHitCountCollector();
    searcher.search(booleanQuery, collector);

    System.out.println("start: " + startDate);
    System.out.println("end: " + endDate);
    System.out.println("total: " + collector.getTotalHits());

    String hitCount = String.valueOf(collector.getTotalHits());
    searcher.close();
    reader.close();
    analyzer.close();
    return hitCount;

完全なドキュメントリスト:

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
    Directory index = new SimpleFSDirectory(new File(LUCENE_INDEX_DIRECTORY));
    IndexReader reader = IndexReader.open(index);

    Query sourceQuery = new TermQuery(new Term("mediaSource", source));
    QueryParser queryParser = new QueryParser(Version.LUCENE_36, "text", analyzer);
    Query textQuery = queryParser.parse(terms);
    Query dateRangeQuery = NumericRangeQuery.newIntRange("publicationDate", startDate, endDate, true, true);

    BooleanQuery booleanQuery = new BooleanQuery();
    booleanQuery.add(sourceQuery, BooleanClause.Occur.MUST);
    booleanQuery.add(textQuery, BooleanClause.Occur.MUST);
    booleanQuery.add(dateRangeQuery, BooleanClause.Occur.MUST);

    IndexSearcher searcher = new IndexSearcher(reader);
    TotalHitCountCollector collector = new TotalHitCountCollector();
    searcher.search(booleanQuery, collector);

    Sort sort = new Sort(new SortField("publicationDate", SortField.INT));

    if (collector.getTotalHits() > 0) {
        TopDocs topDocs = searcher.search(booleanQuery, collector.getTotalHits(), sort);

        int i = 0;
        for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
            ArrayList<String> resultRow = new ArrayList<String>();
            Document doc = searcher.doc(scoreDoc.doc);
            resultRow.add(String.valueOf(i));
            resultRow.add(doc.get("publicationDate"));
            resultRow.add(doc.get("mediaSource"));
            resultRow.add(doc.get("fileName"));
            resultRow.add(doc.get("headline"));
            resultRow.add(doc.get("pageNumber"));
            ql.results.put(String.valueOf(i), resultRow);
            i++;
        }
    } else {
        ArrayList<String> resultRow = new ArrayList<String>();
        resultRow.add("0");
        resultRow.add("0");
        resultRow.add("0");
        resultRow.add("0");
        resultRow.add("0");
        resultRow.add("0");
        ql.results.put("0", resultRow);
    }

切り捨てられた結果 (2058 ドキュメントの最後の 10 個):

20021231 イラクは後回しにされている
20021231 宣教師の広がりにムスリムの怒りが続く
20021231 ホワイトハウスがイラクとの戦争費用の見積もりを削減
20021231 ドラフトを取り戻す
20040101 パキスタン指導者の新戦術：説得
20040101 2004年に私たちがすること
20040101 民族モラスがアフガン憲章交渉を停滞させる
20040101 米国、2 人の兄弟の事件でテロの手がかりを探る
20040101 武器の放棄：リビアの次は誰？
20040101 救援隊のテントが上がるイランの奇妙な光景：アメリカ国旗

score -1 · Accepted Answer

問題は、NumericRangeQueries が正しく機能しないことです。文字列値で RangeQuery を使用すると、問題が修正されます。

java - 日付に NumericRangeQuery を使用すると、Lucene が年をスキップする

1 に答える 1

Related

Reference