google-bigquery - Google Client lib for Java で pageTokens を使用して、ページ分割された BigQuery クエリ結果をリクエストするにはどうすればよいですか?

Question

合計数千行の結果を含む BigQuery クエリを実行したいのですが、一度に 100 件の結果のページのみを取得したいと考えています (maxResultsおよびpageTokenパラメータを使用)。

BigQuery API は、メソッドでのpageTokenパラメータの使用をサポートしていcollection.listます。ただし、非同期クエリを実行し、メソッドを使用して結果を取得していますが、パラメーターgetQueryResultをサポートしていないようです。でspageTokenを使用することは可能ですか?pageTokengetQueryResults

score 14 · Accepted Answer

更新:リストの結果をページングする方法に関する新しいドキュメントがここにあります。

開発者が私に個人的にこれを尋ねたので、私はこの質問に自己回答しています。スタックオーバーフローで回答を共有したいと思います。

pageToken パラメータは、Tabledata.list メソッドからページ分割された結果をリクエストするときに使用できます。たとえば、結果データが 10 万行または 10 MB を超える場合、結果セットは自動的にページ分割されます。maxResults パラメータを明示的に設定して、結果のページネーションを要求することもできます。結果の各ページは pageToken パラメータを返します。これを使用して、結果の次のページを取得できます。

すべてのクエリの結果は、新しい BigQuery テーブルになります。テーブルに明示的に名前を付けない場合、テーブルは 24 時間しか持続しません。ただし、名前のない「匿名」テーブルにも識別子があります。どちらの場合も、クエリジョブを挿入した後、新しく作成されたテーブルの名前を取得します。次に、tabledata.list メソッド (および maxResults/pageToken パラメーターの組み合わせ) を使用して、ページ分割された形式で結果を要求します。pageTokens が返されなくなるまでループして、以前に取得した pageToken を使用して tabledata.list を呼び出し続けます (最後のページに到達したことを意味します)。

Java 用の Google API クライアントライブラリを使用すると、クエリジョブを挿入し、クエリの完了をポーリングし、クエリ結果のページを次々と取得するためのコードは次のようになります。

// Create a new BigQuery client authorized via OAuth 2.0 protocol
// See: https://developers.google.com/bigquery/docs/authorization#installed-applications
Bigquery bigquery = createAuthorizedClient();

// Start a Query Job
String querySql = "SELECT TOP(word, 500), COUNT(*) FROM publicdata:samples.shakespeare";
JobReference jobId = startQuery(bigquery, PROJECT_ID, querySql);

// Poll for Query Results, return result output
TableReference completedJob = checkQueryResults(bigquery, PROJECT_ID, jobId);

// Return and display the results of the Query Job
displayQueryResults(bigquery, completedJob);

/**
 * Inserts a Query Job for a particular query
 */
public static JobReference startQuery(Bigquery bigquery, String projectId,
                                      String querySql) throws IOException {
  System.out.format("\nInserting Query Job: %s\n", querySql);

  Job job = new Job();
  JobConfiguration config = new JobConfiguration();
  JobConfigurationQuery queryConfig = new JobConfigurationQuery();
  config.setQuery(queryConfig);

  job.setConfiguration(config);
  queryConfig.setQuery(querySql);

  Insert insert = bigquery.jobs().insert(projectId, job);
  insert.setProjectId(projectId);
  JobReference jobId = insert.execute().getJobReference();

  System.out.format("\nJob ID of Query Job is: %s\n", jobId.getJobId());

  return jobId;
}

/**
 * Polls the status of a BigQuery job, returns TableReference to results if "DONE"
 */
private static TableReference checkQueryResults(Bigquery bigquery, String projectId, JobReference jobId)
    throws IOException, InterruptedException {
  // Variables to keep track of total query time
  long startTime = System.currentTimeMillis();
  long elapsedTime;

  while (true) {
    Job pollJob = bigquery.jobs().get(projectId, jobId.getJobId()).execute();
    elapsedTime = System.currentTimeMillis() - startTime;
    System.out.format("Job status (%dms) %s: %s\n", elapsedTime,
        jobId.getJobId(), pollJob.getStatus().getState());
    if (pollJob.getStatus().getState().equals("DONE")) {
      return pollJob.getConfiguration().getQuery().getDestinationTable();
    }
    // Pause execution for one second before polling job status again, to
    // reduce unnecessary calls to the BigQUery API and lower overall
    // application bandwidth.
    Thread.sleep(1000);
  }
}

/**
 * Page through the result set
 */
private static void displayQueryResults(Bigquery bigquery,
                                        TableReference completedJob) throws IOException {

    long maxResults = 20;
    String pageToken = null;
    int page = 1;

  // Default to not looping
    boolean moreResults = false;

    do {
    TableDataList queryResult = bigquery.tabledata().list(
            completedJob.getProjectId(),
            completedJob.getDatasetId(),
            completedJob.getTableId())
                .setMaxResults(maxResults)
                .setPageToken(pageToken)
         .execute();
    List<TableRow> rows = queryResult.getRows();
    System.out.print("\nQuery Results, Page #" + page + ":\n------------\n");
    for (TableRow row : rows) {
      for (TableCell field : row.getF()) {
      System.out.printf("%-50s", field.getV());
       }
      System.out.println();
    }
    if (queryResult.getPageToken() != null) {
      pageToken = queryResult.getPageToken();
      moreResults = true;
      page++;
    } else {
      moreResults = false;
    }
  } while (moreResults);
}

google-bigquery - Google Client lib for Java で pageTokens を使用して、ページ分割された BigQuery クエリ結果をリクエストするにはどうすればよいですか?

1 に答える 1

Related

Reference