bash - 2 つのファイルの数値フィールドを awk で比較する方法

Question

私はこれらの 2 つのファイルを持っています: file1

と file2

1070,1279960511,BR,USA,UNITED STATES
1278,1279960511,US,USA,UNITED STATES
1279,1279960527,CA,CAN,CANADA
1289,1279967231,US,USA,UNITED STATES
2679,1279971327,CA,CAN,CANADA
1279,1279971839,US,USA,UNITED STATES
1279,1279972095,CA,CAN,CANADA
1279,1279977471,US,USA,UNITED STATES
127997,1279977983,CA,CAN,CANADA
127997,1279980159,US,USA,UNITED STATES
127998,1279980543,CA,CAN,CANADA
107599,1075995007,US,USA,UNITED STATES
107599,1075995023,VG,VGB,VIRGIN ISLANDS, BRITISH
107599,1075996991,US,USA,UNITED STATES
107599,1075997071,CA,CAN,CANADA

私がしたい：file1の各エントリに対してfile2の最初の列を通過し、この列の値が「file1」要素よりも大きくなると、file2の3番目の要素を返します。多くの方法を試しましたが、どれもうまくいきませんでした空のファイルまたは最後の試行が予想よりも何か差分を出力します:

awk -F, '
BEGIN {FS="," ; i=1 ; while (getline < "file2") { x[i] = $1 ; y[i] = $3 ; i++ }}

{ a[$1] = $1 ; h=1 ; while (x[h] <= a[$1]) { h++ } ; { print y[h] }}' file1

しかし、これは永遠に実行され、止まらず、何も助けてくれませんplzzzこれは何日も私を殺していて、感謝をあきらめています

望ましい出力:

#this is a comment and i ll write file 2 as if it was a matrix  

because file1[1] > file2[1,1] ... and file1[1] > file2[2,1] .... and file1[1] > file2[3,1] ... and file1[1] > file2[4,1] but file1[1] < file2[5,1] ... then print file2[4,3] ... which is "US"

now go to file1[2] :

file[2] > file2[1,1] ... and file1[2] > file2[2,1] ... but file1[2] <= file2[3,1] ... then print file2[3,3]

要約すると、印刷したい：「最初の行の3番目の要素（col）（file2から）file1要素が最初になります>次の行の最初の要素（file2）

score 2 · Accepted Answer

私はあなたの AWK スクリプトを以下の基礎として採用しました。自己文書化に役立つため、変数名をより意味のあるものに変更しました。

#!/usr/bin/awk -f
BEGIN {
    FS=","
    count = 1
    while (getline < "file2") {
        key[count] = $1
        countrycode[count] = $3
        count++
    }
}

{
    for (idx = 1; idx <= count; idx++)
    {
        if ($1 < key[idx]) {
            print countrycode[idx]
            next
        }
    }
}

実行例 ($0単なるの代わりに印刷$3- 上記のコードは印刷のみ$3):

$ sort -n -k1,1 -t, file2 > tmp; mv tmp file2
$ ./scannums file1
2679,1279971327,CA,CAN,CANADA
1289,1279967231,US,USA,UNITED STATES
1278,1279960511,US,USA,UNITED STATES
127997,1279977983,CA,CAN,CANADA
2679,1279971327,CA,CAN,CANADA
1278,1279960511,US,USA,UNITED STATES
1278,1279960511,US,USA,UNITED STATES
1289,1279967231,US,USA,UNITED STATES
127997,1279977983,CA,CAN,CANADA

file2 の値は基準を満たさないため、file1 の値 135441 については何も出力されないことに注意してください。

必要に応じて、これをワンライナーにすることもできます。

score 2 · Accepted Answer

これは機能しますか？

sort -n -t"," -k1,1 file1 file2 | awk -F"," '{if ($3 != "") {s = $3;} else {print $1 " " s;}}'

生産する

1075 BR
1169 BR
1260 BR
1279 US
1281 US
1474 US
2537 US
10759 CA
12799 CA
135441 CA

file1 の元の順序が重要な場合は、以下を使用できます

awk '{print NR "," $1}' file1 file2 | sort -t"," -n -k 2,2 | awk -F"," '{if ($4 != "") {s = $4;} else {print $1 " " s;}}' | sort -t"," -k1,1 | cut -d" " -f2

生産する

US
CA
BR
BR
US
CA
US
BR
CA
US

score 1 · Accepted Answer

長いワンライナー:

これを行う 1 つの方法を次に示します。

cat file1|grep -vE '^$'|while read min; do cat file2|while read line; do val=$(echo $line|cut -d, -f1); if [ $min -lt $val ]; then short_country=$(echo $line|cut -d, -f3); echo $min: $short_country "($val)"; break; fi; done; done

これにより、出力が得られます

2537: CA (2679)
1279: US (1289)
1075: US (1278)
12799: CA (127997)
1474: CA (2679)
1260: US (1278)
1169: US (1278)
1281: US (1289)
10759: CA (127997)

説明

これをワンライナーにする代わりに、スクリプトで分解すると理解しやすくなります。

#!/bin/bash

cat file1 |                               # read file1
grep -E '^[0-9]+$' |                      # filter out lines in file1 that don't just contain a number
while read min; do                        # for each line in file1:
  cat file2 |                               # read file2
  grep -E '^([0-9]+,){2}[A-Z]{2},' |        # filter out lines in file2 that don't match the right format
  while read line; do                       # for each line in file2:
    val=$(echo $line|cut -d, -f1)             # pull out $val: the first comma-delimited value
    if [ $min -lt $val ]; then                # if it's greater than the $min value read from file1:
      short_country=$(echo $line|cut -d, -f3)   # get the $short_country from the third comma-delimited value in file2
      echo "$min: $short_country ($val)"        # print it to stdout. You can get rid of ($val) here if you're not interested in it.
      break                                     # Now that we've found a value in file2, stop this loop and go to the next line in file1
    fi
  done
done

もともと出力形式を指定していなかったので、推測しました。うまくいけば、このようにあなたに役立ちます。

score 1 · Accepted Answer

xargs課題の「ファイル 1 の読み取り」の部分だけに使用することはできませんか? awkでは、単一の「file2の値を求める」部分は非常に単純であり、二重のファイルポインターを回避します...

編集: xargs と awk の使用例。

cat file1 | xargs awk '$1 > ARGV[2] {print $3; return}' file2

編集：この例は機能します（今私のコンピューターで試しました...）

-n 1 を xargs のオプションとして使用して、各パスで正確に 1 つの引数を渡します。格納後に「val」引数を削除すると、AWK はファイル名 (file2) だけを取得し、何をすべきかがわかります。見つかった場合にフラグを立てます。リターンは存在しません。

cat file1 | xargs -n 1 awk -F, 'BEGIN {val = ARGV[2]; ARGC--; found=0} $1 > val {if (found==0) { print val, $3; found = 1}}' file2

編集：短いバージョン

cat file1 | xargs -n 1 awk -F, 'BEGIN {val = ARGV[2]; ARGC--} (!found) && ($1 > val)  {print val, $3; found = 1}' file2

スクリプトのバージョン:

#!/usr/bin/awk -f
BEGIN {
  val = ARGV[2]
  ARGC--
}
(!found) && ($1 <= val) {
  # cache 3rd column of previous line
  prev = $3
}
(!found) && ($1 > val) {
  # print cached value as soon as we cross the limit
  print val, prev
  found = 1
}

これに find_val.awk という名前を付け、chmod +x とします。あなたはちょうどfind_val.awk somefile somevalue同じようにxargsを実行して使用します

cat file1 | xargs -n 1 find_val.awk file2

bash - 2 つのファイルの数値フィールドを awk で比較する方法

4 に答える 4

長いワンライナー:

説明

Related

Reference