sed - ファイルの最初のフィールドに基づく一意の行の数

Question

最初のフィールドに基づいてファイルに出力された一意の行の数を取得しようとしています。入力行は次のようになります。

Forms.js     /forms/Forms.js     http://www.gumby.com/test.htm   404
Forms.js     /forms/Forms1.js    http://www.gumby.com/test.htm   404
Forms.js     /forms/Forms2.js    http://www.gumby.com/test.htm   404
Interpret.js     /forms/Interpret1.js    http://www.gumby.com/test.htm   404    
Interpret.js     /forms/Interpret2.js    http://www.gumby.com/test.htm   404
Interpret.js     /forms/Interpret3.js    http://www.gumby.com/test.htm   404

このようなものに：

3    Forms.js    /forms/Forms.js     http://www.gumby.com.mx/test.htm 404
3    Interpret.js    /forms/Interpret.js    http://www.gumby.com.mx/test.htm  404

sortとuniqのさまざまな組み合わせを試してきましたが、まだヒットしていません。行全体を使用して明確な行を取得できますが、最初のフィールドが必要です。私は現在cygwinを使用しています。私は読み書きができませんが、それが進むべき道だと思います。誰もが便利な解決策を持っていますか？

score 4 · Accepted Answer

これ：

<infile awk '{ h[$1]++ } END { for(k in h) print h[k], k }'

あなたを取得します：

3 Forms.js
3 Interpret.js

最初のヒットも使用したい場合：

awk '!h[$1] { g[$1]=$0 } { h[$1]++ } END { for(k in g) print h[k], g[k] }'

出力：

3 Forms.js /forms/Forms.js http://www.gumby.com/test.htm 404
3 Interpret.js /forms/Interpret1.js http://www.gumby.com/test.htm 404

GNUawkでテスト済み。

これは、入力をソートする必要がないことに注意してください。また、結果は順序付けられていないことに注意してください。

score 2 · Accepted Answer

Awkこれのためのツールですが、あなたが賢くなりたいのであればuniq：

$ column -t file | uniq -w12 -c
      3 Forms.js      /forms/Forms.js       http://www.gumby.com/test.htm  404
      3 Interpret.js  /forms/Interpret1.js  http://www.gumby.com/test.htm  404

column -tすべての列を揃えて、列1の幅を固定します。

または、利用できない場合のハックcolumnは、行の終わりに最初の列を追加し、最後の列で一意をカウントするために使用awkし、フィールドを印刷するために再度使用することです。uniq -c -f4awkn-1

$ awk '{print $0, $1}' file | uniq -c -f4 | awk '{$NF=""; NF--; print}'
3 Forms.js /forms/Forms.js http://www.gumby.com/test.htm 404
3 Interpret.js /forms/Interpret1.js http://www.gumby.com/test.htm 404

またはuniq -fのように動作するといいでしょう。-f4,4f1,1

またはrev、ファイルを元に戻すために使用して、実行してから元に戻すuniq -c -f3こともできます（ただし、最後にカウントを取得しますが、持っていない場合はおそらく持っていません）revcolumnrev

$ rev file | uniq -c -f3 | rev
Forms.js /forms/Forms.js http://www.gumby.com/test.htm 404 3      
Interpret.js /forms/Interpret1.js http://www.gumby.com/test.htm 404 3

score 2 · Accepted Answer

$ awk '!c[$1]++{v[$1]=$0} END{for (i in c) print c[i],v[i]}' file
3 Forms.js     /forms/Forms.js     http://www.gumby.com/test.htm   404
3 Interpret.js     /forms/Interpret1.js    http://www.gumby.com/test.htm   404

上記では、'！array [$ n] ++'の一般的なawkイディオムを使用して、キー値（$nは$0または$1または$4、$ 5または...）が以前に見られたかどうかを示します。

score 1 · Accepted Answer

サンプル入力が含まれていると仮定しfile.txtます。

sort file.txt | awk -f counts.awk file

戻り値：

3:Forms.js     /forms/Forms.js     http://www.gumby.com/test.htm   404
3:Interpret.js     /forms/Interpret1.js    http://www.gumby.com/test.htm   404

awkスクリプトファイル：

cat counts.awk

#  output format is:
#+ TimesFirstFieldIsRepeated:FirstMatchingLineContents

BEGIN {

  plmatch="";
  pline="";
  outline="";
  n=1;

 }

{

 if($1 != plmatch && NR != 1)
  {
   print n ":" outline;
   n=1;
   outline="";
  }

 if($1 == plmatch)
  {
   n+=1;
   if(outline == ""){
     outline=pline;
    }
  }

 plmatch=$1;
 pline=$0;

}

END {
  print n ":" outline;
 }

score 0 · Accepted Answer

私はただcut -f 1 | uniq -c。それでは行全体がわかりませんが、行が異なる場合は、行を印刷してもあまり意味がありません。何を達成したいかによります。

score 0 · Accepted Answer

最初のフィールドの量を数えることができますが、cutこのフィールドの後に何を印刷したいですか？

cat file | cut -d " " -f 1 | uniq -c

sed - ファイルの最初のフィールドに基づく一意の行の数

6 に答える 6

Related

Reference