私の目標は、1 つのクエリによって生成されたグループが、同じクエリの出力と同じグループであるかどうかをテストすることです。ただし、単一の変数名を変更すると、異なる結果が得られます。
以下に、結果が同じであることがわかっている同じクエリの例を示します。ただし、このグループを実行すると、クエリによって結果が異なることがわかります。
SELECT grp
FROM
(
SELECT CONCAT(word, corpus) AS grp, rank1, rank2
FROM (
SELECT
word, corpus,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY test1 DESC) AS rank1,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY word_count DESC) AS rank2,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus DESC) AS rank3,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus_date DESC) AS rank4
FROM
(
SELECT *, (word_count * word_count * corpus_date) AS test1
FROM [bigquery-public-data:samples.shakespeare]
)
)
)
WHERE rank1 <= 3 OR rank2 <= 3
HAVING grp NOT IN
(
SELECT grp FROM (
SELECT CONCAT(word, corpus) AS grp, rank1, rank2
FROM
(
SELECT
word, corpus,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY test2 DESC) AS rank1,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY word_count DESC) AS rank2,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus DESC) AS rank3,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus_date DESC) AS rank4
FROM
(
SELECT *, (word_count * word_count * corpus_date) AS test2
FROM [bigquery-public-data:samples.shakespeare]
)
)
)
WHERE rank1 <= 3 OR rank2 <= 3
)
さらに悪いことに、まったく同じクエリを実行しようとしても、変数名をtest1からtest3に変更すると、まったく異なる結果が得られます。
SELECT grp
FROM
(
SELECT CONCAT(word, corpus) AS grp, rank1, rank2
FROM (
SELECT
word, corpus,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY test3 DESC) AS rank1,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY word_count DESC) AS rank2,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus DESC) AS rank3,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus_date DESC) AS rank4
FROM
(
SELECT *, (word_count * word_count * corpus_date) AS test3
FROM [bigquery-public-data:samples.shakespeare]
)
)
)
WHERE rank1 <= 3 OR rank2 <= 3
HAVING grp NOT IN
(
SELECT grp FROM (
SELECT CONCAT(word, corpus) AS grp, rank1, rank2
FROM
(
SELECT
word, corpus,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY test2 DESC) AS rank1,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY word_count DESC) AS rank2,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus DESC) AS rank3,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus_date DESC) AS rank4
FROM
(
SELECT *, (word_count * word_count * corpus_date) AS test2
FROM [bigquery-public-data:samples.shakespeare]
)
)
)
WHERE rank1 <= 3 OR rank2 <= 3
)
これらの奇妙な動作の両方を満たす説明は考えられず、これがデータの検証を妨げています。何か案は?
編集:
応答が示唆する方法で BigQuery SQL を更新しましたが、同じ不整合が発生します。