.net - correct handling of duplicate rows in database in .NET

Question

Say I have hundreds of thousands of records in a text file which I'd like to insert into the database every day. Of which around half of them already exist within the database. Also an unique row is defined using say 6 columns.

What is the correct way to code the insert in .NET in this particular case? The two which I'm wondering over are:

Do I SQL-insert straight away and catch the SQLException for duplicate entries? In this case, I'd be breaking the concept that Exceptions should be used only for exceptional cases and not for the frequent cases.

or

Do I do a SQL-select first to check for the row before I do an insert? In this case, it'd seem that the database will do the insert and check for the uniqueness a second time automatically despite having just completed a select.

score 1 · Accepted Answer

挿入する前に行をチェックする sql ステートメントを使用します。以下は、一意性がチェックされる forname と surname の 2 つの列を持つ person という名前のテーブルの簡単な例です。

/// <summary>
/// Insert a row into the person table
/// </summary>
/// <param name="connection">An open sql connection</param>
/// <param name="forename">The forename which will be inserted</param>
/// <param name="surname">The surname which will be inserted</param>
/// <returns>True if a new row was added, False otherwise</returns>
public static bool InsertPerson(SqlConnection connection, string forename, string surname)
{
    using (SqlCommand command = connection.CreateCommand())
    {
        command.CommandText =
            @"Insert into person (forename, surname)
                Select @forename, @surname
                Where not exists 
                    (
                        select 'X' 
                        from person 
                        where 
                            forename = @forename 
                            and surname=@surname
                    )";
        command.Parameters.AddWithValue("@forename", forename);
        command.Parameters.AddWithValue("@surname", surname);

        int rowsInserted = command.ExecuteNonQuery();

        // rowsInserted will be 0 if the row is already in the database
        return rowsInserted == 1;
    }
}

score 0 · Accepted Answer

例外的な方法を選ぶべきだと思います。そのようなことをするだけです：

foreach(var elem in elemntsFromFile)
{
    try
    {
       context.sometable.Add(elem);
       context.SaveChanges();
    }
    catch
    {
    }
}

一瞬。db.saveChangesがすべての反復で実行されるのは好きではありませんが、100％の場合、「select-firstの方法」よりもパフォーマンスが向上します。それは機能し、同様に機能します。

score 0 · Accepted Answer

重複を無視する簡単な方法は、オプション IGNORE_DUP_KEY=ON を使用して一意のインデックスを作成することです。これにより、重複のテストや例外のキャッチのオーバーヘッドが発生しなくなります。

例えば

CREATE UNIQUE NONCLUSTERED INDEX [IX_IgnoreDuplicates] ON [dbo].[Test]
(
    [Id] ASC,
    [Col1] ASC,
    [Col2] ASC
)
WITH (IGNORE_DUP_KEY = ON)

また、 BULK INSERTを使用して、自動重複削除ですべてのデータを効率的にロードすることもできます。

索引の作成を参照してください

.net - correct handling of duplicate rows in database in .NET

3 に答える 3

Related

Reference