linux - 2つの異なるファイルに重複する行を表示する

Question

2つのファイルがあり、重複した行を表示したいと思います。私はこれを試しましたが、機能しません：

cat id1.txt | while read id; do grep "$id" id2.txt; done

ファイル内の重複行を表示する他の方法があるかどうか疑問に思っています。私の2つのファイルの両方にIDのリストが含まれています。ありがとうございました。

score 20 · Accepted Answer

ファイルはソートされていますか？並べ替えることはできますか？

ソートされている場合：

comm -12 id1.txt id2.txt

ソートされていないがbash4.xを使用している場合：

comm -12 <(sort id1.txt) <(sort id2.txt)

bash4.xと「プロセス置換」がない場合は、一時ファイルを使用するソリューションがあります。

また、使用することができますgrep -F：

grep -F -f id1.txt id2.txt

id1.txtこれにより、に表示される単語が検索されid2.txtます。ここでの唯一の問題は、IDがどこか1に含まれるすべてのIDと一致しないようにすることです。1の一部のバージョンで使用可能な-wまたは-xオプションは、grepここで機能します。

score 12 · Accepted Answer

重複を検出することで、両方のファイルに存在する（または1つのファイル内に重複する）印刷行を意味する場合は、次を使用できますuniq。

$ cat file1 file2 | sort | uniq -d

score 2 · Accepted Answer

comm代わりに次のコマンドを使用できます。

sort id1.txt > id1.txt.sorted
sort id2.txt > id2.txt.sorted
comm -12 id1.txt.sorted id2.txt.sorted

1つのコマンドでそれを実行したい場合：

comm -12 <(sort id1.txt) <(sort id2.txt)

comm：への引数

引数は、最初のファイルで一意の-1行を抑制します。
引数は、2番目のファイルで一意の-2行を抑制します。
引数を渡すと-3、共通行が抑制されます。

score 1 · Accepted Answer

awkを使用すると時間を節約できます。

awk 'FNR==NR{lines[$0]=1;next} $0 in lines' id1.txt id2.txt

#explaination
FNR==NR #check whether the File NR equal to NR, 
#which will only be true for the first file
lines[$0]=1 #put the contents into a dictionary, 
#value is 1, key is the lines of the first file
next #do not do the next commands if FNR==NR
$0 in lines #check whether the line in the second file
# is in the dictionary
#if yes, will print the $0
#acturally, I omitted the {print},
#which is default to print by awk if condition is true

linux - 2つの異なるファイルに重複する行を表示する

4 に答える 4

Related

Reference