replace - 特定の列を検索し、次の列をgawkで特定の値に置き換えます

Question

データに繰り返し行があるすべての場所を見つけて、繰り返し行を削除しようとしています。また、2番目の列の値が90である場所を探しており、次の2番目の列を指定した特定の番号に置き換えています。

私のデータは次のようになります。

 #      Type    Response        Acc     RT      Offset    
   1      70  0    0   0.0000 57850
   2      31  0    0   0.0000 59371
   3      41  0    0   0.0000 60909
   4      70  0    0   0.0000 61478
   5      31  0    0   0.0000 62999 
   6      41  0    0   0.0000 64537
   7      41  0    0   0.0000 64537
   8      70  0    0   0.0000 65106
   9      11  0    0   0.0000 66627
  10      21  0    0   0.0000 68165
  11      90  0    0   0.0000 68700
  12      31  0    0   0.0000 70221

データを次のように表示したい：

 #      Type    Response        Acc     RT      Offset
   1      70  0    0   0.0000 57850
   2      31  0    0   0.0000 59371
   3      41  0    0   0.0000 60909
   4      70  0    0   0.0000 61478
   5      31  0    0   0.0000 62999
   6      41  0    0   0.0000 64537
   8      70  0    0   0.0000 65106
   9      11  0    0   0.0000 66627
  10      21  0    0   0.0000 68165
  11      90  0    0   0.0000 68700
  12       5  0    0   0.0000 70221

私のコード：

 BEGIN {
priorline = "";
ERROROFFSET = 50;
ERRORVALUE[10] = 1;
ERRORVALUE[11] = 2;
ERRORVALUE[12] = 3;
ERRORVALUE[30] = 4;
ERRORVALUE[31] = 5;
ERRORVALUE[32] = 6;

ORS = "\n";
}

NR == 1 {
print;
getline;
priorline = $0;
}

NF == 6 {

brandnewline = $0
mytype = $2
$0 = priorline
priorField2 = $2;   

if (mytype !~ priorField2) {
print;
priorline = brandnewline;
}

if (priorField2 == "90") {
    mytype = ERRORVALUE[mytype];
    }
}

END {print brandnewline}


##Here the parameters of the brandnewline is set to the current line and then the
##proirline is set to the line on which we just worked on and the brandnewline is
##set to be the next new line we are working on. (i.e line 1 = brandnewline, now
##we set priorline = brandnewline, thus priorline is line 1 and brandnewline takes
##on line 2) Next, the same parameters were set with column 2, mytype being the 
##current column 2 value and priorField2 being the same value as mytype moves to
##the next column 2 value.  Finally, we wrote an if statement where, if the value
##in column 2 of the current line !~ (does not equal) value of column two of the
##previous line, then the current line will be print otherwise it will just be
##skipped over.  The second if statement recognizes the lines in which the value
##90 appeared and replaces the value in column 2 with a previously defined
##ERRORVALUE set for each specific type (type 10=1, 11=2,12=3, 30=4, 31=5, 32=6).

繰り返し行を正常に削除できましたが、コードの次の部分を実行できません。これは、BEGINでERRORVALUES（10 = 1、11 = 2、12 = 3）として指定した値を置き換えることです。、30 = 4、31 = 5、32 = 6）、その値を含む実際の列。基本的に、行のその値をERRORVALUEに置き換えたいだけです。

誰かがこれを手伝ってくれるなら、私はとても感謝しています。

score 2 · Accepted Answer

1 つの課題は、ID 番号が異なるため、1 つの行を前の行と単純に比較できないことです。

awk '
  BEGIN {
    ERRORVALUE[10] = 1
    # ... etc
  }

  # print the header
  NR == 1 {print; next}

  NR == 2 || $0 !~ prev_regex {
    prev_regex = sprintf("^\\s+\\w+\\s+%s\\s+%s\\s+%s\\s+%s\\s+%s",$2,$3,$4,$5,$6)
    if (was90) $2 = ERRORVALUE[$2]
    print
    was90 = ($2 == 90)
  }
'

2 番目の列が変更された行の場合、これにより行の書式設定が台無しになります。

 #      Type    Response        Acc     RT      Offset
   1      70  0    0   0.0000 57850
   2      31  0    0   0.0000 59371
   3      41  0    0   0.0000 60909
   4      70  0    0   0.0000 61478
   5      31  0    0   0.0000 62999
   6      41  0    0   0.0000 64537
   8      70  0    0   0.0000 65106
   9      11  0    0   0.0000 66627
  10      21  0    0   0.0000 68165
  11      90  0    0   0.0000 68700
12 5 0 0 0.0000 70221

それが問題になる場合は、gawk の出力をにパイプするcolumn -tか、行の形式が固定されていることがわかっている場合は、awk プログラムで printf() を使用します。

score 1 · Accepted Answer

これはあなたのために働くかもしれません：

v=99999
sed ':a;$!N;s/^\(\s*\S*\s*\)\(.*\)\s*\n.*\2/\1\2/;ta;s/^\(\s*\S*\s*\)   90 /\1'"$(printf "%5d" $v)"' /;P;D' file
 #      Type    Response        Acc     RT      Offset    
   1      70  0    0   0.0000 57850
   2      31  0    0   0.0000 59371
   3      41  0    0   0.0000 60909
   4      70  0    0   0.0000 61478
   5      31  0    0   0.0000 62999 
   6      41  0    0   0.0000 64537
   8      70  0    0   0.0000 65106
   9      11  0    0   0.0000 66627
  10      21  0    0   0.0000 68165
  11   99999  0    0   0.0000 68700
  12      31  0    0   0.0000 70221

score 1 · Accepted Answer

これはあなたのために働くかもしれません：

awk 'BEGIN {
        ERROROFFSET = 50;
        ERRORVALUE[10] = 1;
        ERRORVALUE[11] = 2;
        ERRORVALUE[12] = 3;
        ERRORVALUE[30] = 4;
        ERRORVALUE[31] = 5;
        ERRORVALUE[32] = 6;
     }
     NR == 1 { print ; next }
     { if (a[$2 $6]) { next } else { a[$2 $6]++ }
       if ( $2 == 90) { print ; n++ ; next } 
       if (n>0) { $2 = ERRORVALUE[$2] ; n=0 }
       printf("% 4i% 8i%  3i% 5i% 9.4f% 6i\n", $1, $2, $3, $4, $5, $6)
     }' INPUTFILE

ideone.com で実際の動作をご覧ください。

IMOBEGINブロックは明らかです。次に、次のことが起こります。

行は最初のNR == 1行を出力します（そして次の行に切り替わります。また、このルールは最初の行にのみ適用されます）
次に、同じ 2 列目と 6 列目の任意の行が既に表示されているかどうかを確認し、そうであれば次の行に切り替えます。 2番目に大きな値があり、6番目に小さな値がある場合は失敗します（たとえば、2 0020連結は20020であり、と同じです20 020）a[$2 "-" $6]。より正確に）
行が902 番目の列にある場合は、次の行でスワップし、次の行に切り替えるフラグ (入力ファイル内)
次の行で 2 番目の列をチェックし、ERRORVALUE見つかった場合はその内容を置き換えます。
次に、フォーマットされた行を出力します。

score 0 · Accepted Answer

私は、ファイルを2回パスする方が良いというGlennに同意します。次のようなハッシュを使用して、重複した、おそらく連続していない行を削除できます。

awk '!a[$2,$3,$4,$5,$6]++' file.txt

次に、必要に応じて値を編集する必要があります。902番目の列の値をに変更する場合は、次の5000ようにしてみてください。

awk 'NR == 1 { print; next } { sub(/^90$/, "5000", $2); printf("%4i% 8i% 3i% 5i% 9.4f% 6i\n", $1, $2, $3, $4, $5, $6) }' file.txt

書式設定のためにZsoltのprintfステートメントを盗んだことがわかります（Zsoltに感謝します！）が、必要に応じてこれを編集できます。また、最初のステートメントからの出力を2番目のステートメントにパイプして、優れたワンライナーにすることもできます。

cat file.txt | awk '!a[$2,$3,$4,$5,$6]++' | awk 'NR == 1 { print; next } { sub(/^90$/, "5000", $2); printf("%4i% 8i% 3i% 5i% 9.4f% 6i\n", $1, $2, $3, $4, $5, $6) }'

score 0 · Accepted Answer

前のオプションはほとんどの場合機能しますが、これは私が行う簡単で甘い方法です。他の投稿を確認した後、これが最も効率的であると思います。さらに、これにより、OP がコメントに追加した追加のリクエストで、90 の後の行を 2 行前の変数に置き換えることもできます。これにより、すべてが 1 回のパスで実行されます。

BEGIN {
    PC2=PC6=1337
    replacement=5
}
{
    if( $6 == PC6 ) next
    if( PC2 == 90 ) $2 = replacement
    replacement = PC2
    PC2 = $2 
    PC6 = $6
    printf "%4s%8s%3s%5s%9s%6s\n",$1, $2, $3, $4, $5, $6
}

入力例

   1      70  0    0   0.0000 57850
   2      31  0    0   0.0000 59371
   3      41  0    0   0.0000 60909
   4      70  0    0   0.0000 61478
   5      31  0    0   0.0000 62999 
   6      41  0    0   0.0000 64537
   7      41  0    0   0.0000 64537
   8      70  0    0   0.0000 65106
   9      11  0    0   0.0000 66627
  10      21  0    0   0.0000 68165
  11      90  0    0   0.0000 68700
  12      31  0    0   0.0000 70221

出力例

   1      70  0    0 0.000000 57850
   2      31  0    0 0.000000 59371
   3      41  0    0 0.000000 60909
   4      70  0    0 0.000000 61478
   5      31  0    0 0.000000 62999
   6      41  0    0 0.000000 64537
   8      70  0    0 0.000000 65106
   9      11  0    0 0.000000 66627
  10      21  0    0 0.000000 68165
  11      90  0    0 0.000000 68700
  12      21  0    0 0.000000 70221

replace - 特定の列を検索し、次の列をgawkで特定の値に置き換えます

5 に答える 5

Related

Reference