c# - Identity列に親子関係を持つSqlBulkCopyとDataTables

Question

1つ以上の子テーブルによって外部キーとして参照される親テーブルのIdentity主キーに基づいて、親子関係を持ついくつかのテーブルを更新する必要があります。

データ量が多いため、これらのテーブルをメモリ内に構築してから、C＃のSqlBulkCopyを使用して、DataSetまたは個々のDataTableからデータベースをまとめて更新します。
さらに、複数のスレッド、プロセス、場合によってはクライアントから、これを並行して実行したいと考えています。

F＃のプロトタイプは、パフォーマンスが34倍向上するという大きな期待を示していますが、このコードは、親テーブルに既知のID値を強制します。強制されていない場合、SqlBulkCopyが行を挿入すると、Identity列はデータベースで正しく生成されますが、Identity値はメモリ内のDataTableで更新されません。さらに、たとえそうであったとしても、DataSetが親子関係を正しく修正するかどうかは明確ではないため、子テーブルは後で正しい外部キー値を使用して書き込むことができます。

SqlBulkCopyにID値を更新させる方法、さらに、個々のDataTableでDataAdapterがFillSchemaに呼び出されたときに自動的に行われない場合に、親子関係を保持および更新するようにDataSetを構成する方法を説明できますか？

私が探していない答え：

データベースを読み取って現在の最高のID値を見つけ、各親行を作成するときに手動でインクリメントします。複数のプロセス/クライアントでは機能しません。トランザクションが失敗すると、一部のIdentity値がスキップされる可能性があるため、このメソッドは関係を台無しにする可能性があります。
親行を一度に1つずつ書き込み、ID値を要求します。これにより、SqlBulkCopyを使用することで得られた利点の少なくとも一部が無効になります（はい、親の行よりも子の行がはるかに多くなりますが、親の行はまだたくさんあります）。

次の未回答の質問に似ています。

自動生成されたIDキーを使用してデータセットの親と子のテーブルを更新するにはどうすればよいですか？

score 10 · Accepted Answer

まず第一に、SqlBulkCopy はあなたが望むことをすることができません。その名の通り「一方通行」です。できるだけ早くデータをSQLサーバーに移動します。生のテキストファイルをテーブルにインポートする古い一括コピーコマンドの .Net バージョンです。そのため、SqlBulkCopy を使用している場合、ID 値を取得する方法はありません。

私は大量のデータ処理を行ってきましたが、この問題に何度か直面しました。ソリューションは、アーキテクチャとデータの分散によって異なります。ここにいくつかのアイデアがあります：

スレッドごとに 1 セットのターゲットテーブルを作成し、これらのテーブルにインポートします。最後に、これらのテーブルを結合します。これらのほとんどは、TABLENAME というテーブルから TABLENAME_THREAD_ID というテーブルを自動的に生成する非常に一般的な方法で実装できます。
ID 生成をデータベースから完全に移動します。たとえば、ID を生成する中央 Web サービスを実装します。その場合、呼び出しごとに 1 つの ID を生成するのではなく、ID 範囲を生成する必要があります。そうしないと、通常、ネットワークのオーバーヘッドがボトルネックになります。
データから ID を生成してみてください。それが可能であれば、あなたの問題はなくなっていたでしょう。断食するのは「無理だ」と言ってはいけません。おそらく、後処理ステップでクリーンアップできる文字列 ID を使用できますか?

もう 1 つ注意点があります。BulkCopy を使用した場合の 34 倍の増加は、意見としては小さいように聞こえます。データをすばやく挿入したい場合は、データベースが正しく構成されていることを確認してください。

score 4 · Accepted Answer

この記事を読んでください。これはまさにあなたが探しているものであり、それ以上のものだと思います。とても素敵でエレガントなソリューションです。

http://www.codinghelmet.com/?path=howto/bulk-insert

score 1 · Accepted Answer

SqlBulkCopy を使用して目的を達成できる唯一の方法は、最初にデータをステージングテーブルに挿入することです。次に、ストアドプロシージャを使用して、データを宛先テーブルに配布します。はい、これにより速度が低下しますが、それでも高速です。

また、データの再設計、つまり分割、非正規化などを検討することもできます。

score 1 · Accepted Answer

set identity_insert <table> onそしてdbcc checkidentここにあなたの友達です。これは、私が過去に行ったことと似ています (コードサンプルを参照)。唯一の本当の注意点は、データを挿入できるのは更新プロセスだけであるということです。更新が進行している間、他のすべての人はプールから出なければなりません。もちろん、本番テーブルをロードする前に、この種のマッピングをプログラムで行うこともできます。ただし、挿入にも同じ制限が適用されます。更新プロセスは、実行される唯一のプロセスです。

--
-- start with a source schema -- doesn't actually need to be SQL tables
-- but from the standpoint of demonstration, it makes it easier
--
create table source.parent
(
  id   int         not null primary key ,
  data varchar(32) not null ,
)
create table source.child
(
  id        int         not null primary key ,
  data      varchar(32) not null ,
  parent_id int         not null foreign key references source.parent(id) ,
)

--
-- On the receiving end, you need to create staging tables.
-- You'll notice that while there are primary keys defined,
-- there are no foreign key constraints. Depending on the
-- cleanliness of your data, you might even get rid of the
-- primary key definitions (though you'll need to add
-- some sort of processing to clean the data one way or
-- another, obviously).
--
-- and, depending context, these could even be temp tables
--
create table stage.parent
(
  id   int         not null primary key ,
  data varchar(32) not null ,
)

create table stage.child
(
  id        int         not null primary key ,
  data      varchar(32) not null ,
  parent_id int         not null ,
)

--
-- and of course, the final destination tables already exist,
-- complete with identity properties, etc.
--
create table dbo.parent
(
  id int not null identity(1,1) primary key ,
  data varchar(32) not null ,
)
create table dbo.child
(
  id int not null identity(1,1) primary key ,
  data varchar(32) not null ,
  parent_id int not null foreign key references dbo.parent(id) ,
)

-----------------------------------------------------------------------
-- so, you BCP or otherwise load your staging tables with the new data
-- frome the source tables. How this happens is left as an exercise for
-- the reader. We'll just assume that some sort of magic happens to
-- make it so. Don't forget to truncate the staging tables prior to
-- loading them with data.
-----------------------------------------------------------------------

-------------------------------------------------------------------------
-- Now we get to work to populate the production tables with the new data
--
-- First we need a map to let us create the new identity values.
-------------------------------------------------------------------------
drop table #parent_map
create table #parent_map
(
  old_id int not null primary key nonclustered       ,
  offset int not null identity(1,1) unique clustered ,
  new_id int     null ,  
)
create table #child_map
(
  old_id int not null primary key nonclustered ,
  offset int not null identity(1,1) unique clustered ,
  new_id int     null ,
)

insert #parent_map ( old_id ) select id from stage.parent
insert #child_map  ( old_id ) select id from stage.child

-------------------------------------------------------------------------------
-- now that we've got the map, we can blast the data into the production tables
-------------------------------------------------------------------------------

--
-- compute the new ID values
--
update #parent_map set new_id = offset + ( select max(id) from dbo.parent )

--
-- blast it into the parent table, turning on identity_insert
--
set identity_insert dbo.parent on

insert dbo.parent (id,data)
select id   = map.new_id   ,
       data = staging.data
from stage.parent staging
join #parent_map  map     on map.old_id = staging.id

set identity_insert dbo.parent off

--
-- reseed the identity properties high water mark
--
dbcc checkident dbo.parent , reseed


--
-- compute the new ID values
--
update #child_map set new_id = offset + ( select max(id) from dbo.child )

--
-- blast it into the child table, turning on identity_insert
--
set identity_insert dbo.child on

insert dbo.child ( id , data , parent_id )
select id        = parent.new_id   ,
       data      = staging.data    ,
       parent_id = parent.new_id

from stage.child staging
join #child_map  map      on map.old_id    = staging.id
join #parent_map parent   on parent.old_id = staging.parent_id

set identity_insert dbo.child off

--
-- reseed the identity properties high water mark
--
dbcc checkident dbo.child , reseed

------------------------------------
-- That's about all there is too it.
------------------------------------

score 0 · Accepted Answer

あなたが直面しているトレードオフは、BulkInsert のパフォーマンスと Identity の信頼性です。

データベースを一時的に SingleUserMode にして挿入を実行できますか?

非常に大きなテーブルに Identity 列を追加している変換プロジェクトで、非常によく似た問題に直面しましたが、それらには子がありました。幸いなことに、親ソースと子ソース (TextDataReader を使用) の ID をセットアップして BulkInsert を実行し、親ファイルと子ファイルを同時に生成することができました。

また、あなたが話しているパフォーマンスの向上も得ました.OleDBDataReader Source -> StreamWriter ...そしてTextDataReader -> SQLBulk

c# - Identity列に親子関係を持つSqlBulkCopyとDataTables

5 に答える 5

Related

Reference