csv - awkを使用して名前でCSVフィールドを取得する

Question

次の形式のヘッダーを持つ CSV ファイルがあるとします。

Field1,Field2
3,262000
4,449000
5,650000
6,853000
7,1061000
8,1263000
9,1473000
10,1683000
11,1893000

コンマ区切りのフィールド名のリストを取り、targetそれを配列に分割し、指定した名前の列のみを選択する awk スクリプトを作成したいと思います。

これは私がこれまでに試したことであり、head配列に目的のヘッダーが含まれていること、およびtargets配列に特定のコマンドラインによって渡された目的のターゲットが含まれていることを確認しました。

BEGIN{
    FS=","
    split(target, targets, ",")

}

NR==1 {
    for (i = 1; i <= NF; i++) head[i] = $i
}

NR !=1{
    for (i = 1; i <= NF; i++) {
        if (head[i] in targets){
            print $i
        }
    }
}

コマンドでこのスクリプトを呼び出すと

awk -v target=Field1 -f GetCol.awk Debug.csv

何も印刷されません。

score 10 · Accepted Answer

私はそれを理解し、他の人が同じ問題に遭遇した場合に備えて回答を投稿しています.

これは、in配列メンバーシップのテストに使用しているキーワードに関係しています。このキーワードは、左側のオペランドが値ではなく、右側の配列のインデックスの 1 つであるかどうかのみをテストします。修正は、次のように逆引き配列を作成することです。

BEGIN{
    OFS=FS=","
    split(target, t_targets, ",")
    for (i in t_targets)
        targets[t_targets[i]] = i
}

score 6 · Accepted Answer

私の2セント：

BEGIN{
    OFS=FS=","
    split(target,fields,FS)            # We just set FS don't hard the comma here
    for (i in fields)                  # Distinct var name to aviod headaches
        field_idx[fields[i]] = i       # Reverse lookup 
}
NR==1 {                                # Process header
    for (i=1;i<=NF;i++)                # For each field header
        head[i] = $i                   # Add to hash for comparision with target
    next                               # Skip to next line
}
{                                      # Don't need invert condition (used next)
    sep=""                             # Set for leading separator
    for (i=1;i<=NF;i++)                # For each field
        if (head[i] in field_idx) {    # Test for current field is a target field
            printf "%s%s",sep,$i       # Print the column if matched 
            sep=OFS                    # Set separator to OFS                  
    }
    printf "\n"                        # Print newline character
}

score 1 · Accepted Answer

@sudo_Oのソリューションの拡張（ありがとう）

コマンドライン引数に基づいて標準入力からフィールドを出力します。
要求された順序でフィールドを出力します (おそらく複数回)。
フィールドが要求されたが見つからなかった場合にプレースホルダーを出力し、
ヘッダー内のフィールド名の重複について標準エラーに警告します。

#!/usr/bin/awk -f
# Process standard input outputting named columns provided as arguments.
#
# For example, given foo.dat containing
#     a b c c
#     1a 1b 1c 1C
#     2a 2b 2c 2C
#     3a 3b 3c 3C
# Running
#   cat foo.dat | ./namedcols c b a a d
# will output
#   1c 1b 1a 1a d
#   2c 2b 2a 2a d
#   3c 3b 3a 3a d
# and will warn on standard error that it
#   Ignored duplicate 'c' in column 4
# Notice that the requested but missing column d contains "d".
#
# Using awk's -F feature it is possible to parse comma-separated data:
#   cat foo.csv | ./namedcols -F, c b a a d
BEGIN {
    for (i=1; i<ARGC; ++i)
        desired[i] = ARGV[i]
    delete ARGV
}
NR==1 {
    for (i=1; i<=NF; i++)
        if ($i in names)
            printf "Ignored duplicate '%s' in column %d\n", $i, i | "cat 1>&2"
        else
            names[$i] = i
    next
}
{
    for (i=1; i<ARGC; ++i)
        printf "%s%s",                                          \
               (i==1 ? "" : OFS),                               \
               ((ndx = names[name = desired[i]])>0 ? $ndx: name)
    printf RS
}

csv - awkを使用して名前でCSVフィールドを取得する

3 に答える 3

Related

Reference