linux - 値にコンマを含むタブ区切りファイルを .CSV に変換し、コンマを含む値を二重引用符で囲む方法は?

Question

特定のベンダーの Web ポータルからダウンロードした.CSVファイル (tab_delimited_file.csv としましょう) があります。ファイルを Linux ディレクトリの 1 つに移動したとき、この特定の.CSVファイルが実際には.CSVという名前のタブ区切りファイルであることに気付きました。ファイルのいくつかのサンプルレコードを以下に示します。

"""column1"""   """column2"""   """column3"""   """column4"""   """column5"""   """column6"""   """column7"""  
12  455 string with quotes, and with a comma in between 4432    6787    890 88  
4432    6787    another, string with quotes, and with two comma in between  890 88  12  455  
11  22  simple string   77  777 333 22

上記のサンプルレコードはで区切られていtabsます。ファイルのヘッダーが非常に奇妙であることは知っていますが、これがファイル形式を受け取った方法です。

コマンドを使用trしてを置き換えようtabsとしcommasましたが、レコード値に余分なコンマが含まれているため、ファイルが完全にめちゃくちゃになります。コンマを含むレコード値を二重引用符で囲む必要があります。私が使用したコマンドは以下の通りです。

tr '\t' ',' < tab_delimited_file.csv > comma_separated_file.csv

これにより、ファイルが次の形式に変換されます。

"""column1""","""column2""","""column3""","""column4""","""column5""","""column6""","""column7"""
12,455,string with quotes, and with a comma in between,4432,6787,890,88
4432,6787,another, string with quotes, and with two comma in between,890,88,12,455
11,22,simple string,77,777,333,22

サンプルファイルを以下の形式に変換するのに助けが必要です。

column1,column2,column3,column4,column5,column6,column7
12,455,"string with quotes, and with a comma in between",4432,6787,890,88
4432,6787,"another, string with quotes, and with two comma in between",890,88,12,455
11,22,"simple string",77,777,333,22

sedまたはを使用するソリューションはawk非常に便利です。

score 2 · Accepted Answer

これにより、要求した出力が生成されますが、たとえば、どのフィールドを引用符で囲むか (コンマまたはスペースを含むもの) について、私が想定している基準が実際にあなたが望むものであるかどうかは明らかではありませんので、テストしてください他の入力を使用して自分で確認します。

$ awk 'BEGIN { FS=OFS="\t" }
  {
     gsub(/"/,"")
     for (i=1;i<=NF;i++)
         if ($i ~ /[,[:space:]]/)
             $i = "\"" $i "\""
     gsub(OFS,",")
     print
  }
  ' file
column1,column2,column3,column4,column5,column6,column7
12,455,"string with quotes, and with a comma in between",4432,6787,890,88
4432,6787,"another, string with quotes, and with two comma in between",890,88,12,455
11,22,"simple string",77,777,333,22

linux - 値にコンマを含むタブ区切りファイルを .CSV に変換し、コンマを含む値を二重引用符で囲む方法は?

2 に答える 2

Related

Reference