sql-server - 引用符に一貫性のないCSVファイルのSQLServer一括挿入

Question

フィールドが引用符で囲まれているだけのCSVファイルをBULKINSERT（SQL Server）することは可能ですか？具体的には、引用符は「、」を含むフィールドのみを囲みます。

つまり、次のようなデータがあります（最初の行にはヘッダーが含まれています）。

id, company, rep, employees
729216,INGRAM MICRO INC.,"Stuart, Becky",523
729235,"GREAT PLAINS ENERGY, INC.","Nelson, Beena",114
721177,GEORGE WESTON BAKERIES INC,"Hogan, Meg",253

引用符が一貫していないため、区切り文字として'"、"'を使用できません。また、これを説明するフォーマットファイルを作成する方法がわかりません。

'、'をデリミタとして使用し、すべての列がvarcharである一時テーブルにロードしてから、いくつかのkludgy処理を使用して引用符を削除しようとしましたが、'、'を含むフィールドがあるため、これも機能しません。複数の列に分割されます。

残念ながら、CSVファイルを事前に操作することはできません。

これは絶望的ですか？

アドバイスをよろしくお願いします。

ちなみに、私はこのpost SQLのcsvからの一括インポートを見ましたが、その場合、すべてのフィールドは一貫して引用符で囲まれていました。したがって、その場合、彼は区切り文字として「、」を使用し、後で引用符を取り除くことができます。

score 19 · Accepted Answer

MSDNから、このファイルの一括挿入を行うことはできません。

一括インポート用のデータファイルとして使用するには、CSVファイルが次の制限に準拠している必要があります。

データフィールドにフィールドターミネータが含まれることはありません。
データフィールドの値がないか、すべてが引用符（ ""）で囲まれています。

（http://msdn.microsoft.com/en-us/library/ms188609.aspx）

ファイルをインポートできるようにするために必要なのは、いくつかの簡単なテキスト処理だけです。または、ユーザーは、seのガイドラインに従ってファイルをフォーマットするか、区切り文字としてコンマ以外のものを使用する必要があります（例：|）

score 18 · Accepted Answer

ファイル、期間を前処理する必要があります。

本当にこれを行う必要がある場合は、ここにコードがあります。絶対に仕方がなかったので書いた。これはユーティリティコードであり、私はそれを誇りに思っていませんが、機能します。アプローチは、SQLに引用符で囲まれたフィールドを理解させるのではなく、ファイルを操作してまったく異なる区切り文字を使用することです。

編集：これがgithubリポジトリのコードです。改善され、ユニットテストが付属しています！https://github.com/chrisclark/Redelim-it

この関数は入力ファイルを受け取り、すべてのフィールド区切りコンマ（引用符で囲まれたテキストフィールド内のコンマではなく、実際の区切りフィールドのみ）を新しい区切り文字に置き換えます。次に、コンマの代わりに新しいフィールド区切り文字を使用するようにSQLServerに指示できます。ここでの関数のバージョンでは、プレースホルダーは< TMP >です（これは元のcsvには表示されないことを確信しています。表示される場合は、爆発に備えてください）。

したがって、この関数を実行した後、次のようにしてsqlにインポートします。

BULK INSERT MyTable
FROM 'C:\FileCreatedFromThisFunction.csv'
WITH
(
FIELDTERMINATOR = '<*TMP*>',
ROWTERMINATOR = '\n'
)

そして、それ以上の苦労なしに、私があなたに負わせたことを前もって謝罪するひどい、ひどい機能（編集-私はここに私のブログの機能だけでなくこれを行う動作するプログラムを投稿しました）：

Private Function CsvToOtherDelimiter(ByVal InputFile As String, ByVal OutputFile As String) As Integer

        Dim PH1 As String = "<*TMP*>"

        Dim objReader As StreamReader = Nothing
        Dim count As Integer = 0 'This will also serve as a primary key'
        Dim sb As New System.Text.StringBuilder

        Try
            objReader = New StreamReader(File.OpenRead(InputFile), System.Text.Encoding.Default)
        Catch ex As Exception
            UpdateStatus(ex.Message)
        End Try

        If objReader Is Nothing Then
            UpdateStatus("Invalid file: " & InputFile)
            count = -1
            Exit Function
        End If

        'grab the first line
    Dim line = reader.ReadLine()
    'and advance to the next line b/c the first line is column headings
    If hasHeaders Then
        line = Trim(reader.ReadLine)
    End If

    While Not String.IsNullOrEmpty(line) 'loop through each line

        count += 1

        'Replace commas with our custom-made delimiter
        line = line.Replace(",", ph1)

        'Find a quoted part of the line, which could legitimately contain commas.
        'In that case we will need to identify the quoted section and swap commas back in for our custom placeholder.
        Dim starti = line.IndexOf(ph1 & """", 0)
        If line.IndexOf("""",0) = 0 then starti=0

        While starti > -1 'loop through quoted fields

            Dim FieldTerminatorFound As Boolean = False

            'Find end quote token (originally  a ",)
            Dim endi As Integer = line.IndexOf("""" & ph1, starti)

            If endi < 0 Then
                FieldTerminatorFound = True
                If endi < 0 Then endi = line.Length - 1
            End If

            While Not FieldTerminatorFound

                'Find any more quotes that are part of that sequence, if any
                Dim backChar As String = """" 'thats one quote
                Dim quoteCount = 0
                While backChar = """"
                    quoteCount += 1
                    backChar = line.Chars(endi - quoteCount)
                End While

                If quoteCount Mod 2 = 1 Then 'odd number of quotes. real field terminator
                    FieldTerminatorFound = True
                Else 'keep looking
                    endi = line.IndexOf("""" & ph1, endi + 1)
                End If
            End While

            'Grab the quoted field from the line, now that we have the start and ending indices
            Dim source = line.Substring(starti + ph1.Length, endi - starti - ph1.Length + 1)

            'And swap the commas back in
            line = line.Replace(source, source.Replace(ph1, ","))

            'Find the next quoted field
            '                If endi >= line.Length - 1 Then endi = line.Length 'During the swap, the length of line shrinks so an endi value at the end of the line will fail
            starti = line.IndexOf(ph1 & """", starti + ph1.Length)

        End While

            line = objReader.ReadLine

        End While

        objReader.Close()

        SaveTextToFile(sb.ToString, OutputFile)

        Return count

    End Function

score 8 · Accepted Answer

クリスの答えは非常に役に立ちましたが、SQL Server内からT-SQLを使用して（CLRを使用せずに）実行したかったので、彼のコードをT-SQLコードに変換しました。しかし、それから私はそれをさらに一歩進めて、次のことを行うストアドプロシージャにすべてをまとめました。

一括挿入を使用して、CSVファイルを最初にインポートします
クリスのコードを使用して行をクリーンアップします
結果を表形式で返します

必要に応じて、値の前後の引用符を削除し、2つの二重引用符を1つの二重引用符に変換することで、行をさらにクリーンアップしました（これが正しい方法だと思います）。

CREATE PROCEDURE SSP_CSVToTable

-- Add the parameters for the stored procedure here
@InputFile nvarchar(4000)
, @FirstLine int

AS

BEGIN

-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;

--convert the CSV file to a table
--clean up the lines so that commas are handles correctly

DECLARE @sql nvarchar(4000)
DECLARE @PH1 nvarchar(50)
DECLARE @LINECOUNT int -- This will also serve as a primary key
DECLARE @CURLINE int
DECLARE @Line nvarchar(4000)
DECLARE @starti int
DECLARE @endi int
DECLARE @FieldTerminatorFound bit
DECLARE @backChar nvarchar(4000)
DECLARE @quoteCount int
DECLARE @source nvarchar(4000)
DECLARE @COLCOUNT int
DECLARE @CURCOL int
DECLARE @ColVal nvarchar(4000)

-- new delimiter
SET @PH1 = '†'

-- create single column table to hold each line of file
CREATE TABLE [#CSVLine]([line] nvarchar(4000))

-- bulk insert into temp table
-- cannot use variable path with bulk insert
-- so we must run using dynamic sql
SET @Sql = 'BULK INSERT #CSVLine
FROM ''' + @InputFile + '''
WITH
(
FIRSTROW=' + CAST(@FirstLine as varchar) + ',
FIELDTERMINATOR = ''\n'',
ROWTERMINATOR = ''\n''
)'

-- run dynamic statement to populate temp table
EXEC(@sql)

-- get number of lines in table
SET @LINECOUNT = @@ROWCOUNT

-- add identity column to table so that we can loop through it
ALTER TABLE [#CSVLine] ADD [RowId] [int] IDENTITY(1,1) NOT NULL

IF @LINECOUNT > 0
BEGIN
    -- cycle through each line, cleaning each line
    SET @CURLINE = 1
    WHILE @CURLINE <= @LINECOUNT
    BEGIN
        -- get current line
        SELECT @line = line
          FROM #CSVLine
         WHERE [RowId] = @CURLINE

        -- Replace commas with our custom-made delimiter
        SET @Line = REPLACE(@Line, ',', @PH1)

        -- Find a quoted part of the line, which could legitimately contain commas.
        -- In that case we will need to identify the quoted section and swap commas back in for our custom placeholder.
        SET @starti = CHARINDEX(@PH1 + '"' ,@Line, 0)
        If CHARINDEX('"', @Line, 0) = 0 SET @starti = 0

        -- loop through quoted fields
        WHILE @starti > 0 
        BEGIN
            SET @FieldTerminatorFound = 0

            -- Find end quote token (originally  a ",)
            SET @endi = CHARINDEX('"' + @PH1, @Line, @starti)  -- sLine.IndexOf("""" & PH1, starti)

            IF @endi < 1
            BEGIN
                SET @FieldTerminatorFound = 1
                If @endi < 1 SET @endi = LEN(@Line) - 1
            END

            WHILE @FieldTerminatorFound = 0
            BEGIN
                -- Find any more quotes that are part of that sequence, if any
                SET @backChar = '"' -- thats one quote
                SET @quoteCount = 0

                WHILE @backChar = '"'
                BEGIN
                    SET @quoteCount = @quoteCount + 1
                    SET @backChar = SUBSTRING(@Line, @endi-@quoteCount, 1) -- sLine.Chars(endi - quoteCount)
                END

                IF (@quoteCount % 2) = 1
                BEGIN
                    -- odd number of quotes. real field terminator
                    SET @FieldTerminatorFound = 1
                END
                ELSE 
                BEGIN
                    -- keep looking
                    SET @endi = CHARINDEX('"' + @PH1, @Line, @endi + 1) -- sLine.IndexOf("""" & PH1, endi + 1)
                END

            END

            -- Grab the quoted field from the line, now that we have the start and ending indices
            SET @source = SUBSTRING(@Line, @starti + LEN(@PH1), @endi - @starti - LEN(@PH1) + 1) 
            -- sLine.Substring(starti + PH1.Length, endi - starti - PH1.Length + 1)

            -- And swap the commas back in
            SET @Line = REPLACE(@Line, @source, REPLACE(@source, @PH1, ','))
            --sLine.Replace(source, source.Replace(PH1, ","))

            -- Find the next quoted field
            -- If endi >= line.Length - 1 Then endi = line.Length 'During the swap, the length of line shrinks so an endi value at the end of the line will fail
            SET @starti = CHARINDEX(@PH1 + '"', @Line, @starti + LEN(@PH1))
            --sLine.IndexOf(PH1 & """", starti + PH1.Length)

        END

        -- get table based on current line
        IF OBJECT_ID('tempdb..#Line') IS NOT NULL
            DROP TABLE #Line

        -- converts a delimited list into a table
        SELECT *
        INTO #Line
        FROM dbo.iter_charlist_to_table(@Line,@PH1)

        -- get number of columns in line
        SET @COLCOUNT = @@ROWCOUNT

        -- dynamically create CSV temp table to hold CSV columns and lines
        -- only need to create once
        IF OBJECT_ID('tempdb..#CSV') IS NULL
        BEGIN
            -- create initial structure of CSV table
            CREATE TABLE [#CSV]([Col1] nvarchar(100))

            -- dynamically add a column for each column found in the first line
            SET @CURCOL = 1
            WHILE @CURCOL <= @COLCOUNT
            BEGIN
                -- first column already exists, don't need to add
                IF @CURCOL > 1 
                BEGIN
                    -- add field
                    SET @sql = 'ALTER TABLE [#CSV] ADD [Col' + Cast(@CURCOL as varchar) + '] nvarchar(100)'

                    --print @sql

                    -- this adds the fields to the temp table
                    EXEC(@sql)
                END

                -- go to next column
                SET @CURCOL = @CURCOL + 1
            END
        END

        -- build dynamic sql to insert current line into CSV table
        SET @sql = 'INSERT INTO [#CSV] VALUES('

        -- loop through line table, dynamically adding each column value
        SET @CURCOL = 1
        WHILE @CURCOL <= @COLCOUNT
        BEGIN
            -- get current column
            Select @ColVal = str 
              From #Line 
             Where listpos = @CURCOL

            IF LEN(@ColVal) > 0
            BEGIN
                -- remove quotes from beginning if exist
                IF LEFT(@ColVal,1) = '"'
                    SET @ColVal = RIGHT(@ColVal, LEN(@ColVal) - 1)

                -- remove quotes from end if exist
                IF RIGHT(@ColVal,1) = '"'
                    SET @ColVal = LEFT(@ColVal, LEN(@ColVal) - 1)
            END

            -- write column value
            -- make value sql safe by replacing single quotes with two single quotes
            -- also, replace two double quotes with a single double quote
            SET @sql = @sql + '''' + REPLACE(REPLACE(@ColVal, '''',''''''), '""', '"') + ''''

            -- add comma separater except for the last record
            IF @CURCOL <> @COLCOUNT
                SET @sql = @sql + ','

            -- go to next column
            SET @CURCOL = @CURCOL + 1
        END

        -- close sql statement
        SET @sql = @sql + ')'

        --print @sql

        -- run sql to add line to table
        EXEC(@sql)

        -- move to next line
        SET @CURLINE = @CURLINE + 1

    END

END

-- return CSV table
SELECT * FROM [#CSV]

END

GO

ストアドプロシージャは、文字列をテーブルに解析するこのヘルパー関数を利用します（Erland Sommarskogに感謝します！）：

CREATE FUNCTION [dbo].[iter_charlist_to_table]
                (@list      ntext,
                 @delimiter nchar(1) = N',')
     RETURNS @tbl TABLE (listpos int IDENTITY(1, 1) NOT NULL,
                         str     varchar(4000),
                         nstr    nvarchar(2000)) AS

BEGIN
  DECLARE @pos      int,
          @textpos  int,
          @chunklen smallint,
          @tmpstr   nvarchar(4000),
          @leftover nvarchar(4000),
          @tmpval   nvarchar(4000)

  SET @textpos = 1
  SET @leftover = ''
  WHILE @textpos <= datalength(@list) / 2
  BEGIN
     SET @chunklen = 4000 - datalength(@leftover) / 2
     SET @tmpstr = @leftover + substring(@list, @textpos, @chunklen)
     SET @textpos = @textpos + @chunklen

     SET @pos = charindex(@delimiter, @tmpstr)

     WHILE @pos > 0
     BEGIN
        SET @tmpval = ltrim(rtrim(left(@tmpstr, @pos - 1)))
        INSERT @tbl (str, nstr) VALUES(@tmpval, @tmpval)
        SET @tmpstr = substring(@tmpstr, @pos + 1, len(@tmpstr))
        SET @pos = charindex(@delimiter, @tmpstr)
     END

     SET @leftover = @tmpstr
  END

  INSERT @tbl(str, nstr) VALUES (ltrim(rtrim(@leftover)), ltrim(rtrim(@leftover)))

RETURN

END

これが私がT-SQLからそれを呼び出す方法です。この場合、結果を一時テーブルに挿入しているので、最初に一時テーブルを作成します。

    -- create temp table for file import
CREATE TABLE #temp
(
    CustomerCode nvarchar(100) NULL,
    Name nvarchar(100) NULL,
    [Address] nvarchar(100) NULL,
    City nvarchar(100) NULL,
    [State] nvarchar(100) NULL,
    Zip nvarchar(100) NULL,
    OrderNumber nvarchar(100) NULL,
    TimeWindow nvarchar(100) NULL,
    OrderType nvarchar(100) NULL,
    Duration nvarchar(100) NULL,
    [Weight] nvarchar(100) NULL,
    Volume nvarchar(100) NULL
)

-- convert the CSV file into a table
INSERT #temp
EXEC [dbo].[SSP_CSVToTable]
     @InputFile = @FileLocation
    ,@FirstLine = @FirstImportRow

パフォーマンスはあまりテストしていませんが、必要なもの、つまり1000行未満のCSVファイルをインポートする場合にはうまく機能します。ただし、非常に大きなファイルでは窒息する可能性があります。

うまくいけば、他の誰かもそれが役に立つと思うでしょう。

乾杯！

score 5 · Accepted Answer

また、CSVを一括挿入に使用できる形式に変換する関数も作成しました。次のC＃関数を作成するための開始点として、ChrisClarkによる回答済みの投稿を使用しました。

正規表現を使用してフィールドを検索することになりました。次に、ファイルを1行ずつ再作成し、ファイル全体をメモリにロードしないように、ファイルを新しいファイルに書き込みました。

private void CsvToOtherDelimiter(string CSVFile, System.Data.Linq.Mapping.MetaTable tbl)
{
    char PH1 = '|';
    StringBuilder ln;

    //Confirm file exists. Else, throw exception
    if (File.Exists(CSVFile))
    {
        using (TextReader tr = new StreamReader(CSVFile))
        {
            //Use a temp file to store our conversion
            using (TextWriter tw = new StreamWriter(CSVFile + ".tmp"))
            {
                string line = tr.ReadLine();
                //If we have already converted, no need to reconvert.
                //NOTE: We make the assumption here that the input header file 
                //      doesn't have a PH1 value unless it's already been converted.
                if (line.IndexOf(PH1) >= 0)
                {
                    tw.Close();
                    tr.Close();
                    File.Delete(CSVFile + ".tmp");
                    return;
                }
                //Loop through input file
                while (!string.IsNullOrEmpty(line))
                {
                    ln = new StringBuilder();

                    //1. Use Regex expression to find comma separated values 
                    //using quotes as optional text qualifiers 
                    //(what MS EXCEL does when you import a csv file)
                    //2. Remove text qualifier quotes from data
                    //3. Replace any values of PH1 found in column data 
                    //with an equivalent character
                    //Regex:  \A[^,]*(?=,)|(?:[^",]*"[^"]*"[^",]*)+|[^",]*"[^"]*\Z|(?<=,)[^,]*(?=,)|(?<=,)[^,]*\Z|\A[^,]*\Z
                    List<string> fieldList = Regex.Matches(line, @"\A[^,]*(?=,)|(?:[^"",]*""[^""]*""[^"",]*)+|[^"",]*""[^""]*\Z|(?<=,)[^,]*(?=,)|(?<=,)[^,]*\Z|\A[^,]*\Z")
                            .Cast<Match>()
                            .Select(m => RemoveCSVQuotes(m.Value).Replace(PH1, '¦'))
                            .ToList<string>();

                    //Add the list of fields to ln, separated by PH1
                    fieldList.ToList().ForEach(m => ln.Append(m + PH1));

                    //Write to file. Don't include trailing PH1 value.
                    tw.WriteLine(ln.ToString().Substring(0, ln.ToString().LastIndexOf(PH1)));

                    line = tr.ReadLine();
                }


                tw.Close();
            }
            tr.Close();

            //Optional:  replace input file with output file
            File.Delete(CSVFile);
            File.Move(CSVFile + ".tmp", CSVFile);
        }
    }
    else
    {
        throw new ArgumentException(string.Format("Source file {0} not found", CSVFile));
    }
}
//The output file no longer needs quotes as a text qualifier, so remove them
private string RemoveCSVQuotes(string value)
{
    //if is empty string, then remove double quotes
    if (value == @"""""") value = "";
    //remove any double quotes, then any quotes on ends
    value = value.Replace(@"""""", @"""");
    if (value.Length >= 2)
        if (value.Substring(0, 1) == @"""")
            value = value.Substring(1, value.Length - 2);
    return value;
}

score 3 · Accepted Answer

多くの場合、この問題は、ユーザーがExcelファイルをCSVにエクスポートすることによって発生します。

この問題を回避するには、次の2つの方法があります。

Microsoftの提案に従って、マクロを使用してExcelからエクスポートする
または本当に簡単な方法：
- CSVをExcelで開きます。
- Excelファイルとして保存します。（.xlsまたは.xlsx）。
- そのファイルをExcelファイルとしてSQLServerにインポートします。
- 上記のソリューションのようなものをコーディングする必要がなかったので、自分自身に笑いかけてください.... muhahahaha

Excelファイルとしてインポート

本当にスクリプトを作成したい場合（CSVをExcelとして保存した後）のSQLを次に示します。

select * 
into SQLServerTable FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 
    'Excel 8.0;Database=D:\testing.xls;HDR=YES', 
    'SELECT * FROM [Sheet1$]')

score 2 · Accepted Answer

これは、あなたが喜んで使用するものよりも複雑または複雑になる可能性がありますが...

行をVBまたはC＃のフィールドに解析するためのロジックを実装できる場合は、CLRテーブル値関数（TVF）を使用してこれを行うことができます。

CLR TVFは、C＃またはVBコードでデータを列に分割したり、値を調整したりする場合に、外部ソースからデータを読み込むための優れたパフォーマンスの方法です。

CLRアセンブリをデータベースに追加する必要があります（ファイルを開くことができるように、外部操作または安全でない操作を許可するアセンブリ）。これは少し複雑になったり複雑になったりする可能性がありますが、柔軟性を確保するためには価値があるかもしれません。

できるだけ速くテーブルに定期的にロードする必要のある大きなファイルがいくつかありましたが、一部の列で特定のコード変換を実行する必要があり、単純な一括挿入でデータ型エラーを引き起こす値をロードするには特別な処理が必要でした。

つまり、CLR TVFを使用すると、ファイルの各行に対してC＃またはVBコードを実行し、パフォーマンスのような一括挿入を行うことができます（ただし、ロギングについて心配する必要がある場合があります）。SQL Serverドキュメントの例では、開始点として使用できるイベントログから読み取るTVFを作成できます。

CLR TVFのコードは、最初の行が処理される前のinitステージでのみデータベースにアクセスできることに注意してください（たとえば、各行のルックアップはありません。このような処理を行うには、通常のTVFを使用します）。あなたの質問に基づいて、これは必要ないようです。

また、各CLR TVFには出力列を明示的に指定する必要があるため、異なるcsvファイルごとに再利用可能な汎用列を作成することはできません。

1つのCLRTVFを記述して、ファイルから行全体を読み取り、1列の結果セットを返し、通常のTVFを使用して、ファイルのタイプごとにそれから読み取ることができます。これには、T-SQLで記述される各行を解析するコードが必要ですが、多くのCLRTVFを記述する必要はありません。

score 2 · Accepted Answer

別の方法-フィールドの負荷がないか、データ自体に引用符が表示されることを期待している場合は、REPLACE関数を使用します。

UPDATE dbo.tablename 
        SET dbo.tablename.target_field = REPLACE(t.importedValue, '"', '')
FROM #tempTable t
WHERE dbo.tablename.target_id = t.importedID;

私はそれを使用しました。パフォーマンスに関しては何も主張できません。これは、問題を回避するための迅速で汚い方法です。

score 2 · Accepted Answer

前処理が必要です。

PowerShell関数Import-CSVは、このタイプのファイルをサポートしています。次に、Export-CSVは各値を引用符で囲みます。

単一ファイル：

Import-Csv import.csv | Export-Csv -NoTypeInformation export.csv

多くのファイルをパスC：\ year \ input_date.csvでマージするには：

$inputPath = 'C:\????\input_????????.csv'
$outputPath = 'C:\merged.csv'
Get-ChildItem $inputPath |
  Select -ExpandProperty FullName |
  Import-CSV |
  Export-CSV -NoTypeInformation -Path $outputPath

PowerShellは通常、PowerShellプロキシアカウントを使用してSQLServerエージェントで実行できます。

区切り文字が適切に処理されない場合は、別の区切り文字を明示的に指定してください。

 Export-CSV -NoTypeInformation -Delimiter ';' -Path $outputPath

score 1 · Accepted Answer

クリス、これをたくさんありがとう！あなたは私のビスケットを救った！XLがそのような素晴らしい仕事をするとき、私はバルクローダーがこのケースを処理しないとは信じられませんでした..これらの人はホールでお互いを見ませんか？とにかく...私はConsoleApplicationバージョンが必要だったので、これが私が一緒にハックしたものです。ダウンしていて汚れていますが、チャンピオンのように機能します！区切り文字はアプリに必要ないため、区切り文字をハードコーディングしてコメントアウトしました。

私もここに素敵な大きなビールを貼り付けられたらいいのにと思います。

ねえ、なぜエンドモジュールとパブリッククラスがコードブロックの外にあるのかわかりません...ごめんなさい！

    Module Module1

    Sub Main()

        Dim arrArgs() As String = Command.Split(",")
        Dim i As Integer
        Dim obj As New ReDelimIt()

        Console.Write(vbNewLine & vbNewLine)

        If arrArgs(0) <> Nothing Then
            For i = LBound(arrArgs) To UBound(arrArgs)
                Console.Write("Parameter " & i & " is " & arrArgs(i) & vbNewLine)
            Next


            obj.ProcessFile(arrArgs(0), arrArgs(1))

        Else
            Console.Write("Usage Test1 <inputfile>,<outputfile>")
        End If

        Console.Write(vbNewLine & vbNewLine)
    End Sub

 End Module

 Public Class ReDelimIt

    Public Function ProcessFile(ByVal InputFile As String, ByVal OutputFile As String) As Integer

        Dim ph1 As String = "|"

        Dim objReader As System.IO.StreamReader = Nothing
        Dim count As Integer = 0 'This will also serve as a primary key
        Dim sb As New System.Text.StringBuilder

        Try
            objReader = New System.IO.StreamReader(System.IO.File.OpenRead(InputFile), System.Text.Encoding.Default)
        Catch ex As Exception
            MsgBox(ex.Message)
        End Try

        If objReader Is Nothing Then
            MsgBox("Invalid file: " & InputFile)
            count = -1
            Exit Function
        End If

        'grab the first line
        Dim line = objReader.ReadLine()
        'and advance to the next line b/c the first line is column headings
        'Removed Check Headers can put in if needed.
        'If chkHeaders.Checked Then
        'line = objReader.ReadLine
        'End If

        While Not String.IsNullOrEmpty(line) 'loop through each line

            count += 1

            'Replace commas with our custom-made delimiter
            line = line.Replace(",", ph1)

            'Find a quoted part of the line, which could legitimately contain commas.
            'In that case we will need to identify the quoted section and swap commas back in for our custom placeholder.
            Dim starti = line.IndexOf(ph1 & """", 0)

            While starti > -1 'loop through quoted fields

                'Find end quote token (originally  a ",)
                Dim endi = line.IndexOf("""" & ph1, starti)

                'The end quote token could be a false positive because there could occur a ", sequence.
                'It would be double-quoted ("",) so check for that here
                Dim check1 = line.IndexOf("""""" & ph1, starti)

                'A """, sequence can occur if a quoted field ends in a quote.
                'In this case, the above check matches, but we actually SHOULD process this as an end quote token
                Dim check2 = line.IndexOf("""""""" & ph1, starti)

                'If we are in the check1 ("",) situation, keep searching for an end quote token
                'The +1 and +2 accounts for the extra length of the checked sequences
                While (endi = check1 + 1 AndAlso endi <> check2 + 2) 'loop through "false" tokens in the quoted fields
                    endi = line.IndexOf("""" & ph1, endi + 1)
                    check1 = line.IndexOf("""""" & ph1, check1 + 1)
                    check2 = line.IndexOf("""""""" & ph1, check2 + 1)
                End While

                'We have searched for an end token (",) but can't find one, so that means the line ends in a "
                If endi < 0 Then endi = line.Length - 1

                'Grab the quoted field from the line, now that we have the start and ending indices
                Dim source = line.Substring(starti + ph1.Length, endi - starti - ph1.Length + 1)

                'And swap the commas back in
                line = line.Replace(source, source.Replace(ph1, ","))

                'Find the next quoted field
                If endi >= line.Length - 1 Then endi = line.Length 'During the swap, the length of line shrinks so an endi value at the end of the line will fail
                starti = line.IndexOf(ph1 & """", starti + ph1.Length)

            End While

            'Add our primary key to the line
            ' Removed for now
            'If chkAddKey.Checked Then
            'line = String.Concat(count.ToString, ph1, line)
            ' End If

            sb.AppendLine(line)

            line = objReader.ReadLine

        End While

        objReader.Close()

        SaveTextToFile(sb.ToString, OutputFile)

        Return count

    End Function

    Public Function SaveTextToFile(ByVal strData As String, ByVal FullPath As String) As Boolean
        Dim bAns As Boolean = False
        Dim objReader As System.IO.StreamWriter
        Try
            objReader = New System.IO.StreamWriter(FullPath, False, System.Text.Encoding.Default)
            objReader.Write(strData)
            objReader.Close()
            bAns = True
        Catch Ex As Exception
            Throw Ex
        End Try
        Return bAns
    End Function

End Class

score 1 · Accepted Answer

マイク、「456 2nd St、Apt 5」のようなフィールド内に「、」があるときに、いくつかの問題が見つかりました。

この問題の解決策は、@ http://crazzycoding.blogspot.com/2010/11/import-csv-file-into-sql-server-using.htmlです。

ありがとう、-Ashish

score 1 · Accepted Answer

[、]である必要があるフィールド区切り文字だけでなく、この場合は["]であるテキスト修飾子も指定できる必要があります。[]を使用してそれを囲み、"と混同しないようにします。

score 0 · Accepted Answer

このコードは私のために働きます：

 public bool CSVFileRead(string fullPathWithFileName, string fileNameModified, string tableName)
    {
        SqlConnection con = new SqlConnection(ConfigurationSettings.AppSettings["dbConnectionString"]);
        string filepath = fullPathWithFileName;
        StreamReader sr = new StreamReader(filepath);
        string line = sr.ReadLine();
        string[] value = line.Split(',');
        DataTable dt = new DataTable();
        DataRow row;
        foreach (string dc in value)
        {
            dt.Columns.Add(new DataColumn(dc));
        }
        while (!sr.EndOfStream)
        {
            //string[] stud = sr.ReadLine().Split(',');
            //for (int i = 0; i < stud.Length; i++)
            //{
            //    stud[i] = stud[i].Replace("\"", "");
            //}
            //value = stud;
            value = sr.ReadLine().Split(',');
            if (value.Length == dt.Columns.Count)
            {
                row = dt.NewRow();
                row.ItemArray = value;
                dt.Rows.Add(row);
            }
        }
        SqlBulkCopy bc = new SqlBulkCopy(con.ConnectionString, SqlBulkCopyOptions.TableLock);
        bc.DestinationTableName = tableName;
        bc.BatchSize = dt.Rows.Count;
        con.Open();
        bc.WriteToServer(dt);
        bc.Close();
        con.Close();

        return true;
    }

score 0 · Accepted Answer

私のケースを解決するために、以下をまとめました。非常に大きなファイルを前処理し、一貫性のない見積もりを整理する必要がありました。空のC＃アプリケーションに貼り付け、constsを要件に設定するだけで、すぐに使用できます。これは、10GBを超える非常に大きなCSVで機能しました。

namespace CsvFixer
{
    using System.IO;
    using System.Text;

    public class Program
    {
        private const string delimiter = ",";
        private const string quote = "\"";
        private const string inputFile = "C:\\temp\\input.csv";
        private const string fixedFile = "C:\\temp\\fixed.csv";

        /// <summary>
        /// This application fixes inconsistently quoted csv (or delimited) files with support for very large file sizes.
        /// For example :  1223,5235234,8674,"Houston","London, UK",3425,Other text,stuff 
        /// Must become :  "1223","5235234","8674","Houston","London, UK","3425","Other text","stuff" 
        /// </summary>
        /// <param name="args"></param>
        static void Main(string[] args)
        {
            // Use streaming to allow for large files. 
            using (StreamWriter outfile = new StreamWriter(fixedFile))
            {
                using (FileStream fs = File.Open(inputFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
                using (BufferedStream bs = new BufferedStream(fs))
                using (StreamReader sr = new StreamReader(bs))
                {
                    string currentLine;

                    // Read each input line in and write each fixed line out
                    while ((currentLine = sr.ReadLine()) != null)
                    {
                        outfile.WriteLine(FixLine(currentLine, delimiter, quote));
                    }
                }
            }
        }

        /// <summary>
        /// Fully quote a partially quoted line 
        /// </summary>
        /// <param name="line">The partially quoted line</param>
        /// <returns>The fully quoted line</returns>
        private static string FixLine(string line, string delimiter, string quote)
        {
            StringBuilder fixedLine = new StringBuilder();

            // Split all on the delimiter, acceptinmg that some quoted fields 
            // that contain the delimiter wwill be split in to many pieces.
            string[] fieldParts = line.Split(delimiter.ToCharArray());

            // Loop through the fields (or parts of fields)
            for (int i = 0; i < fieldParts.Length; i++)
            {
                string currentFieldPart = fieldParts[i];

                // If the current field part starts and ends with a quote it is a field, so write it to the result
                if (currentFieldPart.StartsWith(quote) && currentFieldPart.EndsWith(quote))
                {
                    fixedLine.Append(string.Format("{0}{1}", currentFieldPart, delimiter));
                }
                // else if it starts with a quote but doesnt end with one, it is part of a lionger field.
                else if (currentFieldPart.StartsWith(quote))
                {
                    // Add the start of the field
                    fixedLine.Append(string.Format("{0}{1}", currentFieldPart, delimiter));

                    // Append any additional field parts (we will only hit the end of the field when 
                    // the last field part finishes with a quote. 
                    while (!fieldParts[++i].EndsWith(quote))
                    {
                        fixedLine.Append(string.Format("{0}{1}", fieldParts[i], delimiter));
                    }

                    // Append the last field part - i.e. the part containing the closing quote
                    fixedLine.Append(string.Format("{0}{1}", fieldParts[i], delimiter));
                }
                else
                {
                    // The field has no quotes, add the feildpart with quote as bookmarks 
                    fixedLine.Append(string.Format("{0}{1}{0}{2}", quote, currentFieldPart, delimiter));
                }
            }

            // Return the fixed string 
            return fixedLine.ToString();
        }
    }
}

score 0 · Accepted Answer

実践から言えば...SQLServer 2017では、二重引用符の「テキスト修飾子」を提供できますが、区切り文字に「取って代わる」ことはありません。OPの例のように見えるいくつかのファイルを一括挿入します。私のファイルは「.csv」であり、値にコンマが含まれている場合にのみ検出される一貫性のないテキスト修飾子があります。この機能がどのバージョンのSQLServerで機能し始めたかはわかりませんが、SQL Server2017Standardで機能することはわかっています。とても簡単。

score 0 · Accepted Answer

SQLの外部でファイルを前処理する必要はありません。

私のために働いたのは変化していました

ROWTERMINATOR ='\ n'

に

ROWTERMINATOR='0x0a'。

score 0 · Accepted Answer

WITH ( FORMAT='CSV')コマンドを指定するための新しいオプションがSQL2017に追加されましたBULK INSERT。

Microsoft GitHubページの例：

BULK INSERT Product
FROM 'product.csv'
WITH (  DATA_SOURCE = 'MyAzureBlobStorage',
        FORMAT='CSV', CODEPAGE = 65001, --UTF-8 encoding
        FIRSTROW=2,
        ROWTERMINATOR = '0x0a',
        TABLOCK);

このオプションの詳細なドキュメントは、 https ：//docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql？view = sql-server-2017＃input-fileで入手できます。 -format-options

OPが例を示したように、オプションの引用符を含むCSVデータでこのオプションを正常に使用しました。

score -1 · Accepted Answer

4.5 Frameworkを使用して新しい区切り文字に変換するVB.NETプログラムを作成しますTextFieldParserこれにより、テキスト修飾フィールドが自動的に処理されます

組み込みのTextFieldParserを使用するように上記のコードを変更しました

モジュールModule1

Sub Main()

    Dim arrArgs() As String = Command.Split(",")
    Dim i As Integer
    Dim obj As New ReDelimIt()
    Dim InputFile As String = ""
    Dim OutPutFile As String = ""
    Dim NewDelimiter As String = ""

    Console.Write(vbNewLine & vbNewLine)

    If Not IsNothing(arrArgs(0)) Then
        For i = LBound(arrArgs) To UBound(arrArgs)
            Console.Write("Parameter " & i & " is " & arrArgs(i) & vbNewLine)
        Next
        InputFile = arrArgs(0)
        If Not IsNothing(arrArgs(1)) Then
            If Not String.IsNullOrEmpty(arrArgs(1)) Then
                OutPutFile = arrArgs(1)
            Else
                OutPutFile = InputFile.Replace("csv", "pipe")
            End If
        Else
            OutPutFile = InputFile.Replace("csv", "pipe")
        End If
        If Not IsNothing(arrArgs(2)) Then
            If Not String.IsNullOrEmpty(arrArgs(2)) Then
                NewDelimiter = arrArgs(2)
            Else
                NewDelimiter = "|"
            End If
        Else
            NewDelimiter = "|"
        End If
        obj.ConvertCSVFile(InputFile,OutPutFile,NewDelimiter)

    Else
        Console.Write("Usage ChangeFileDelimiter <inputfile>,<outputfile>,<NewDelimiter>")
    End If
    obj = Nothing
    Console.Write(vbNewLine & vbNewLine)
    'Console.ReadLine()

End Sub

エンドモジュール

パブリッククラスReDelimIt

Public Function ConvertCSVFile(ByVal InputFile As String, ByVal OutputFile As String, Optional ByVal NewDelimiter As String = "|") As Integer
    Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser(InputFile)
        MyReader.TextFieldType = FileIO.FieldType.Delimited
        MyReader.SetDelimiters(",")
        Dim sb As New System.Text.StringBuilder
        Dim strLine As String = ""
        Dim currentRow As String()
        While Not MyReader.EndOfData
            Try
                currentRow = MyReader.ReadFields()
                Dim currentField As String
                strLine = ""
                For Each currentField In currentRow
                    'MsgBox(currentField)
                    If strLine = "" Then
                        strLine = strLine & currentField
                    Else
                        strLine = strLine & NewDelimiter & currentField
                    End If
                Next
                sb.AppendLine(strLine)
            Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
                'MsgBox("Line " & ex.Message & "is not valid and will be skipped.")
                Console.WriteLine("Line " & ex.Message & "is not valid and will be skipped.")
            End Try
        End While
        SaveTextToFile(sb.ToString, OutputFile)
    End Using

    Return Err.Number

End Function

Public Function SaveTextToFile(ByVal strData As String, ByVal FullPath As String) As Boolean
    Dim bAns As Boolean = False
    Dim objReader As System.IO.StreamWriter
    Try
        If FileIO.FileSystem.FileExists(FullPath) Then
            Kill(FullPath)
        End If
        objReader = New System.IO.StreamWriter(FullPath, False, System.Text.Encoding.Default)
        objReader.Write(strData)
        objReader.Close()
        bAns = True
    Catch Ex As Exception
        Throw Ex
    End Try
    Return bAns
End Function

エンドクラス

sql-server - 引用符に一貫性のないCSVファイルのSQLServer一括挿入

17 に答える 17

Related

Reference