java - Elasticsearch の 1 つのクエリですべてのレコードを返す

Question

エラスティック検索のデータベースがあり、Web サイトページのすべてのレコードを取得したいと考えています。エラスティック検索ノードに接続し、レコードを検索して何らかの応答を返す Bean を作成しました。検索を行う私の単純な Java コードは次のとおりです。

SearchResponse response = getClient().prepareSearch(indexName)
    .setTypes(typeName)              
    .setQuery(queryString("\*:*"))
    .setExplain(true)
    .execute().actionGet();

しかし、Elasticsearch はデフォルトのサイズを 10 に設定し、応答で 10 件のヒットがありました。データベースに 10 を超えるレコードがあります。サイズを検索に設定するとInteger.MAX_VALUE、検索が非常に遅くなり、これは私が望むものではありません。

応答のサイズを設定せずに、許容時間内に 1 つのアクションですべてのレコードを取得するにはどうすればよいですか?

score 20 · Accepted Answer

public List<Map<String, Object>> getAllDocs(){
        int scrollSize = 1000;
        List<Map<String,Object>> esData = new ArrayList<Map<String,Object>>();
        SearchResponse response = null;
        int i = 0;
        while( response == null || response.getHits().hits().length != 0){
            response = client.prepareSearch(indexName)
                    .setTypes(typeName)
                       .setQuery(QueryBuilders.matchAllQuery())
                       .setSize(scrollSize)
                       .setFrom(i * scrollSize)
                    .execute()
                    .actionGet();
            for(SearchHit hit : response.getHits()){
                esData.add(hit.getSource());
            }
            i++;
        }
        return esData;
}

score 4 · Accepted Answer

スクロール API を使用できます。searchhit イテレータを使用するという他の提案もうまく機能しますが、これらのヒットを更新したくない場合に限ります。

import static org.elasticsearch.index.query.QueryBuilders.*;

QueryBuilder qb = termQuery("multi", "test");

SearchResponse scrollResp = client.prepareSearch(test)
        .addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet(); //max of 100 hits will be returned for each scroll
//Scroll until no hits are returned
do {
    for (SearchHit hit : scrollResp.getHits().getHits()) {
        //Handle the hit...
    }

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(scrollResp.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.

score 0 · Accepted Answer

返される結果の数と、ユーザーを待機させたい時間、および使用可能なサーバーメモリの量をトレードオフする必要があります。1,000,000 のドキュメントをインデックス化した場合、1 回のリクエストでそれらすべての結果を取得する現実的な方法はありません。結果は 1 人のユーザーのものであると想定しています。システムが負荷の下でどのように動作するかを考慮する必要があります。

score 0 · Accepted Answer

SearchResponse response = restHighLevelClient.search(new SearchRequest("Index_Name"), RequestOptions.DEFAULT);
SearchHit[] hits = response.getHits().getHits();

score -2 · Accepted Answer

1.最大サイズを設定します。例: MAX_INT_VALUE;

private static final int MAXSIZE=1000000;

@Override public List getAllSaleCityByCity(int cityId) throws Exception {

    List<EsSaleCity> list=new ArrayList<EsSaleCity>();

    Client client=EsFactory.getClient();
    SearchResponse response= client.prepareSearch(getIndex(EsSaleCity.class)).setTypes(getType(EsSaleCity.class)).setSize(MAXSIZE)
            .setQuery(QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.boolFilter()
                    .must(FilterBuilders.termFilter("cityId", cityId)))).execute().actionGet();

    SearchHits searchHits=response.getHits();

    SearchHit[] hits=searchHits.getHits();
    for(SearchHit hit:hits){
        Map<String, Object> resultMap=hit.getSource();
        EsSaleCity saleCity=setEntity(resultMap, EsSaleCity.class);
        list.add(saleCity);
    }

    return list;

}

2. 検索する前に ES を数えます

CountResponse countResponse = client.prepareCount(getIndex(EsSaleCity.class)).setTypes(getType(EsSaleCity.class)).setQuery(queryBuilder).execute().actionGet();

int size = (int)countResponse.getCount();//これは必要なサイズです。

その後、あなたはすることができます

SearchResponse response= client.prepareSearch(getIndex(EsSaleCity.class)).setTypes(getType(EsSaleCity.class)).setSize(size);

java - Elasticsearch の 1 つのクエリですべてのレコードを返す

10 に答える 10

Related

Reference