regex - SSISインポートのCSV内の二重引用符（修飾子内）を置き換えます

Question

.csvファイルからデータをインポートするSSISパッケージがあります。"このファイルには、その中の各エントリだけでなく、その間にも引用符（）が含まれています。,また、列の区切り文字としてコンマ（）を追加しました。使用している元のデータを提供することはできませんが、フラットファイルソースでデータがどのように渡されるかの例を次に示します。

"ID-1","A "B"", C, D, E","Today"
"ID-2","A, B, C, D, E,F","Yesterday"
"ID-3","A and nothing else","Today"

ご覧のとおり、2番目の列には引用符（およびコンマ）を含めることができます。これは、この行を指すエラーでSSISインポートを破壊します。私は正規表現にあまり詳しくありませんが、この場合はこれが役立つかもしれないと聞いています。

私の目には、すべての二重引用符（"）を一重引用符（）に置き換える必要があり'ます。

...1行の先頭にあるすべての引用符
...1行の終わりにあるすべての引用符
...の一部である引用","

あなたの誰かがこのことで私を助けることができますか？素晴らしいことだ！

前もって感謝します！

score 1 · Accepted Answer

仕様に従って二重引用符を一重引用符に置き換えるには、この単純な正規表現を使用します。この正規表現では、行の先頭または末尾、あるいはその両方に空白を使用できます。

string pattern = @"(?<!^\s*|,)""(?!,""|\s*$)";
string resultString = Regex.Replace(subjectString, pattern, "'", RegexOptions.Multiline);

これはパターンの説明です：

// (?<!^\s*|,)"(?!,"|\s*$)
// 
// Options: ^ and $ match at line breaks
// 
// Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!^\s*|,)»
//    Match either the regular expression below (attempting the next alternative only if this one fails) «^\s*»
//       Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
//       Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
//          Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//    Or match regular expression number 2 below (the entire group fails if this one fails to match) «,»
//       Match the character “,” literally «,»
// Match the character “"” literally «"»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!,"|\s*$)»
//    Match either the regular expression below (attempting the next alternative only if this one fails) «,"»
//       Match the characters “,"” literally «,"»
//    Or match regular expression number 2 below (the entire group fails if this one fails to match) «\s*$»
//       Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
//          Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//       Assert position at the end of a line (at the end of the string or before a line break character) «$»

score 0 · Accepted Answer

正規表現の一致パターンで列を分割できます

/(?:(?<=^")|(?<=",")).*?(?:(?="\s*$)|(?=","))/g

このデモを参照してください。

score 0 · Accepted Answer

二重引用符とカンマを含むCSVをロードする場合、余分な二重引用符が追加され、データも二重引用符で囲まれているという1つの制限があり、ソースファイルのプレビューで確認できます。したがって、派生列タスクを追加して、次の式を指定します。-

（REPLACE（REPLACE（RIGHT（SUBSTRING（TRIM（COL2）、1、LEN（COL2）-1）、LEN（COL2）-2）、 ""、 "@"）、 "\" \ ""、 "\" "）、" @ "、" "）

太字の部分は、二重引用符で囲まれたデータを削除します。

これを試して、これが役立つかどうか教えてください

score 0 · Accepted Answer

"CSV宛先に値を挿入する前に、CSV宛先にテキスト修飾子を使用し、派生列式を追加します

REPLACE(REPLACE([Column1],",",""),"\"","")

"これはテキストフィールドに保持されます

regex - SSISインポートのCSV内の二重引用符（修飾子内）を置き換えます

4 に答える 4

このデモを参照してください。

Related

Reference