次のような x 行があります。
Unable to find latest released revision of 'CONTRIB_046578'.
revision of '
そして'
、この例では単語の間の単語を抽出する必要がありCONTRIB_046578
、可能であればgrep
、sed
または他のコマンドを使用してその単語の出現回数を数えますか?
最もクリーンなソリューションはgrep -Po "(?<=')[^']+(?=')"
$ cat file
Unable to find latest released revision of 'CONTRIB_046578'
Unable to find latest released revision of 'foo'
Unable to find latest released revision of 'bar'
Unable to find latest released revision of 'CONTRIB_046578'
# Print occurences
$ grep -Po "(?<=')[^']+(?=')" file
CONTRIB_046578
foo
bar
CONTRIB_046578
# Count occurences
$ grep -Pc "(?<=')[^']+(?=')" file
4
# Count unique occurrences
$ grep -Po "(?<=')[^']+(?=')" file | sort | uniq -c
2 CONTRIB_046578
1 bar
1 foo
一重引用符で囲まれた各単語の頻度を抽出してカウントするために使用できる 1 つの awk スクリプトを次に示します。
awk '{for (i=1; i<=NF; i++) {if ($i ~ /^'"'.*?'"'/ ) cnt[$i]++;}}
END {for (a in cnt) {b=a; gsub(/'"'"'/, "", b); print b, cnt[a]}}' infile
cat infile
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046578'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046570'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046579'
出力:
awk '{for (i=1; i<=NF; i++) {if ($i ~ /^'"'.*?'"'/ ) cnt[$i]++;}}
END {for (a in cnt) {b=a; gsub(/'"'"'/, "", b); print b, cnt[a]}}' infile
CONTRIB_046579 3
CONTRIB_046578 1
CONTRIB_046570 1
CONTRIB_046572 2
必要なのは、引用符の間にあるものの出現をカウントするための非常に単純な awk スクリプトだけです。
awk -F\' '{c[$2]++} END{for (w in c) print w,c[w]}' file
@anubhava のテスト入力ファイルを使用:
$ cat file
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046578'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046570'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046579'
$
$ awk -F\' '{c[$2]++} END{for (w in c) print w,c[w]}' file
CONTRIB_046578 1
CONTRIB_046579 3
CONTRIB_046570 1
CONTRIB_046572 2
仮定:
入力ファイル:
$ cat test.txt
Unable to find latest released revision of 'CONTRIB_046578'.
Unable to find latest released revision of 'CONTRIB_046572'.
Unable to find latest released revision of 'CONTRIB_046579'.
Unable to find latest released revision of 'CONTRIB_046570'.
Unable to find latest released revision of 'CONTRIB_046572'.
Unable to find latest released revision of 'CONTRIB_046578'.
単語をフィルタリングしてカウントするシェルスクリプト:
$ sed "s/.*'\(.*\)'.*/\1/" test.txt | sort | uniq -c
1 CONTRIB_046570
2 CONTRIB_046572
2 CONTRIB_046578
1 CONTRIB_046579
sed 's/.*\'(.*?)\'.*/$1/' myfile.txt