google-bigquery - Google BigQuery 結合各エラー

Question

大きすぎるデータセットを結合する単純なクエリを実行しようとしていますが、さまざまなエラーが発生しています。ここで再現されているのは、パブリックデータベースを使用した同様のクエリです。

SELECT gn1.actor_attributes.blog, gn1.actor_attributes.company, gn1.actor_attributes.email, gn1.actor_attributes.gravatar_id, gn1.actor_attributes.location, gn1.actor_attributes.login, gn1.actor_attributes.name,gn2.actor_attributes.blog, gn2.actor_attributes.company, gn2.actor_attributes.email, gn2.actor_attributes.gravatar_id, gn2.actor_attributes.location, gn2.actor_attributes.login, gn2.actor_attributes.name
FROM [publicdata:samples.github_nested] as gn1 inner join (select actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name from [publicdata:samples.github_nested] group by actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name) as gn2 on gn1.payload.target.login=gn2.actor_attributes.login
WHERE gn1.type='FollowEvent'

「それぞれの内部結合」がないと、データベースのサイズが大きすぎると言われます。"inner join each" を使用してクエリを実行すると、大きなクエリで次のようなエラーが表示されます。

gn2 は並列化できないため、分割結合を実行できません: (SELECT [actor_attributes.blog]、[actor_attributes.company]、[actor_attributes.email]、[actor_attributes.gravatar_id]、[actor_attributes.location]、[actor_attributes.login]、[actor_attributes .name] FROM [publicdata:samples.github_nested] GROUP BY [actor_attributes.blog], [actor_attributes.company], [actor_attributes.email], [actor_attributes.gravatar_id], [actor_attributes.location], [actor_attributes.login], [ Actor_attributes.name])

どんな助けでも大歓迎です

ありがとう

score 1 · Accepted Answer

[非]公開の例を提供してくれてありがとう。デバッグがはるかに簡単になります。

元のクエリの再フォーマット:

SELECT gn1.actor_attributes.blog, gn1.actor_attributes.company, gn1.actor_attributes.email, gn1.actor_attributes.gravatar_id, gn1.actor_attributes.location, gn1.actor_attributes.login, gn1.actor_attributes.name,gn2.actor_attributes.blog, gn2.actor_attributes.company, gn2.actor_attributes.email, gn2.actor_attributes.gravatar_id, gn2.actor_attributes.location, gn2.actor_attributes.login, gn2.actor_attributes.name
FROM [publicdata:samples.github_nested] AS gn1 
INNER JOIN EACH (
  SELECT actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name
  FROM [publicdata:samples.github_nested]
  GROUP BY actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name) 
AS gn2 ON gn1.payload.target.login=gn2.actor_attributes.login
WHERE gn1.type='FollowEvent'

そのクエリは実際に上記のエラーメッセージで失敗します。エラーメッセージは改善される可能性がありますが、解決策は簡単です。サブクエリの GROUP BY に EACH を追加して、並列化できるようにするだけです。

SELECT gn1.actor_attributes.blog, gn1.actor_attributes.company, gn1.actor_attributes.email, gn1.actor_attributes.gravatar_id, gn1.actor_attributes.location, gn1.actor_attributes.login, gn1.actor_attributes.name,gn2.actor_attributes.blog, gn2.actor_attributes.company, gn2.actor_attributes.email, gn2.actor_attributes.gravatar_id, gn2.actor_attributes.location, gn2.actor_attributes.login, gn2.actor_attributes.name
FROM [publicdata:samples.github_nested] AS gn1 
INNER JOIN EACH (
  SELECT actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name
  FROM [publicdata:samples.github_nested]
  GROUP EACH BY actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name) 
AS gn2 ON gn1.payload.target.login=gn2.actor_attributes.login
WHERE gn1.type='FollowEvent'

[クエリ完了 (12.7 秒経過、237 MB 処理)]

google-bigquery - Google BigQuery 結合各エラー

1 に答える 1

Related

Reference