sql-server - 2つのデータベーステーブル間のバリエーションを効率的に見つける

Question

一連のストアドプロシージャがあります。各ストアドプロシージャは、特定のデータベーステーブルを別のデータベースの同じデータベーステーブルと同期させていると思われます。

データベーステーブルには、最大で数億のレコードがあります。これらの手順が実際にすべての同期を維持していることを検証する最も簡単な方法を見つける必要があります。また、手順ごとに2つのテーブル間で異なるレコードを見つけることができる必要があります（デバッグ目的で）。

私は次のことを知らされました（私は信じているSOのどこかで見つかりましたが、しばらく前だったのでリンクがありません）：

Insert into target_table(columns)
select columns from table1
except
select columns from table2

Insert into target_table(columns)
select columns from table2
except
select columns from table1

十分に速く動作しません。T-SQLプロシージャを使用するか、外部C＃コードを使用するなど、より高速な別の方法を提案できますか？（C＃コードを使用すると、ハッシュ目的でPKを保存できるので、少なくとも主キーを追跡し、残りのフィールドを追跡しなくても、主キーが過剰/欠落しているものを見つけることができると思いました）。

score 3 · Accepted Answer

これを行うのはかなり難しいですが、チェックサムからいくらかのマイレージを得ることができます。1つのアプローチは、キー範囲をいくつかのサブ範囲に分割することです。これらのサブ範囲は、a）並行して、および/またはb）異なるスケジュール間隔で検証できます。例えば：

use master;
go

set nocount on;
go

if db_id('test') is not null
begin
    alter database test set single_user with rollback immediate;
    drop database test;
end
go

create database test;
go

use test;
go

create table data (id int identity(1,1) not null primary key, 
    data1 varchar(38),
    data2 bigint,
    created_at datetime not null default getdate());
go  

declare @i int = 0;
begin transaction   
while @i < 1000000
begin
    insert into data (data1, data2) values (newid(), @i);
    set @i += 1;
    if @i % 1000 = 0
    begin
        commit;
        raiserror (N'Inserted %d', 0, 0, @i);
        begin tran;
    end
end
commit  
raiserror (N'Inserted %d', 0, 0, @i);
go

backup database test to disk='c:\temp\test.bak' with init;
go

if db_id('copy') is not null
begin
    alter database copy set single_user with rollback immediate;
    drop database copy;
end
go

restore database copy from disk='c:\temp\test.bak'
with move 'test' to 'c:\temp\copy.mdf', move 'test_log' to 'c:\temp\copy_log.ldf';
go

-- create some differences
--
update test..data set data1 = newid() where id = cast(rand()*1000000 as int)
update copy..data set data1 = newid() where id = cast(rand()*1000000 as int)

delete from test..data where id = cast(rand()*1000000 as int);
insert into copy..data (data1, data2) values (newid(), -1);


-- do the check
--
declare @id int = 0;
while @id < 1010000
begin
    declare @chk1 int, @chk2 int;
    select @chk1 = checksum_agg(binary_checksum(*)) from test..data where id >= @id and id < @id + 10000
    select @chk2 = checksum_agg(binary_checksum(*)) from copy..data where id >= @id and id < @id + 10000
    if @chk1 != @chk2
    begin
        -- locate the different row(s)
        --
        select t.id, binary_checksum(*) as chk
            from test..data t
            where t.id >= @id and t.id < @id + 10000
        except
        select id, binary_checksum(*) as chk
            from copy..data c
            where c.id >= @id and c.id < @id + 10000;

        select t.id, binary_checksum(*) as chk
            from copy..data t
            where id >= @id and id < @id + 10000
        except
        select id, binary_checksum(*) as chk
            from test..data c
            where c.id >= @id and c.id < @id + 10000;
    end
    else
    begin
        raiserror (N'Range %d is OK', 0,0, @id);
    end
    set @id += 10000;
end

主な問題は、違いを特定するには、すべての行をスキャンすることによってのみ達成できるということです。これは非常にコストがかかります。範囲を使用すると、ローテーションスケジュールで検証するさまざまな範囲を送信できます。CHECKSUM_AGGもちろん、BINARY_CHECKSUM(*)制限が適用されます。

BINARY_CHECKSUM計算で比較できないデータ型の列を無視します。比較できないデータ型には、text、ntext、image、 cursor、xml、および比較できない共通言語ランタイム（CLR）のユーザー定義型が含まれます。

score 1 · Accepted Answer

これらは私がその目的のために使用する2つのクエリです

テーブルチェックサム

Select CheckSum_Agg(Binary_CheckSum(*)) From Table With (NOLOCK)

行チェックサム

Select CheckSum_Agg(Binary_CheckSum(*)) From Table With (NOLOCK) Where Column = Value

クレジットはSQLServerの隠し機能に送られます

sql-server - 2つのデータベーステーブル間のバリエーションを効率的に見つける

2 に答える 2

Related

Reference