linux - Comparing two files in linux terminal

Question

There are two files called "a.txt" and "b.txt" both have a list of words. Now I want to check which words are extra in "a.txt" and are not in "b.txt".

I need a efficient algorithm as I need to compare two dictionaries.

score 392 · Accepted Answer

vimをインストールしている場合は、次のことを試してください。

vimdiff file1 file2

また

vim -d file1 file2

あなたはそれが素晴らしいと思うでしょう。ここに画像の説明を入力してください

score 80 · Accepted Answer

それらを並べ替えて使用しますcomm：

comm -23 <(sort a.txt) <(sort b.txt)

comm（ソートされた）入力ファイルを比較し、デフォルトで3つの列を出力します。aに固有の行、bに固有の行、および両方に存在する行です。-1、-2および/またはを指定することにより-3、対応する出力を抑制することができます。したがってcomm -23 a b、に固有のエントリのみを一覧表示します。構文を使用し<(...)てファイルをその場で並べ替えます。既に並べ替えられている場合は、これは必要ありません。

score 39 · Accepted Answer

からのdiff出力スタイルgit diffが必要な場合は、フラグとともに使用して--no-index、gitリポジトリにないファイルを比較できます。

git diff --no-index a.txt b.txt

それぞれに約200kのファイル名文字列を持ついくつかのファイルを使用して、timeこのアプローチと他のいくつかの回答のベンチマークを（組み込みコマンドで）行いました。

git diff --no-index a.txt b.txt
# ~1.2s

comm -23 <(sort a.txt) <(sort b.txt)
# ~0.2s

diff a.txt b.txt
# ~2.6s

sdiff a.txt b.txt
# ~2.7s

vimdiff a.txt b.txt
# ~3.2s

commgit diff --no-indexdiffスタイルの出力では最速のアプローチであるように見えますが、これまでで最速のようです。

更新2018-03-25git--no-indexリポジトリ内にいて、そのリポジトリ内の追跡されていないファイルを比較したい場合を除いて、実際にはフラグを省略できます。マニュアルページから：

この形式は、ファイルシステム上の指定された2つのパスを比較するためのものです。Gitによって制御される作業ツリーでコマンドを実行し、パスの少なくとも1つが作業ツリーの外側を指している場合、またはGitによって制御される作業ツリーの外側でコマンドを実行する場合は、-no-indexオプションを省略できます。

score 35 · Accepted Answer

35

試してみてくださいsdiff（man sdiff）

sdiff -s file1 file2

于 2014-12-27T12:22:17.140 に答える

score 34 · Accepted Answer

Linuxのツールを使用diffして、2つのファイルを比較できます。--changed-group-formatおよび--unchanged-group-formatオプションを使用して、必要なデータをフィルタリングできます。

次の3つのオプションを使用して、各オプションに関連するグループを選択できます。

'％<'FILE1から行を取得
'％>'FILE2から行を取得
''（空の文字列）両方のファイルから行を削除します。

例：diff --changed-group-format = "％<" --unchanged-group-format = "" file1.txt file2.txt

[root@vmoracle11 tmp]# cat file1.txt 
test one
test two
test three
test four
test eight
[root@vmoracle11 tmp]# cat file2.txt 
test one
test three
test nine
[root@vmoracle11 tmp]# diff --changed-group-format='%<' --unchanged-group-format='' file1.txt file2.txt 
test two
test four
test eight

score 9 · Accepted Answer

以下も使用できます。colordiff：diffの出力を色で表示します。

vimdiffについて：SSH経由でファイルを比較できます。例：

vimdiff /var/log/secure scp://192.168.1.25/var/log/secure

抽出元：http ：//www.sysadmit.com/2016/05/linux-diferencias-entre-dos-archivos.html

score 6 · Accepted Answer

また、 mcdiffを忘れないでください-GNUMidnightCommanderの内部diffビューア。

例えば：

mcdiff file1 file2

楽しみ！

score 4 · Accepted Answer

使用comm -13 （ソートされたファイルが必要）：

$ cat file1
one
two
three

$ cat file2
one
two
three
four

$ comm -13 <(sort file1) <(sort file2)
four

score 1 · Accepted Answer

これに対する私の解決策は次のとおりです。

mkdir temp
mkdir results
cp /usr/share/dict/american-english ~/temp/american-english-dictionary
cp /usr/share/dict/british-english ~/temp/british-english-dictionary
cat ~/temp/american-english-dictionary | wc -l > ~/results/count-american-english-dictionary
cat ~/temp/british-english-dictionary | wc -l > ~/results/count-british-english-dictionary
grep -Fxf ~/temp/american-english-dictionary ~/temp/british-english-dictionary > ~/results/common-english
grep -Fxvf ~/results/common-english ~/temp/american-english-dictionary > ~/results/unique-american-english
grep -Fxvf ~/results/common-english ~/temp/british-english-dictionary > ~/results/unique-british-english

score 1 · Accepted Answer

次のものも使用できます。

sdiff file1 file2

ターミナル内で違いを並べて表示するには！

score 0 · Accepted Answer

diff a.txt b.txt | grep '<'

その後、きれいな出力のために切断するためにパイプすることができます

diff a.txt b.txt | grep '<' | cut -c 3

score 0 · Accepted Answer

質問に対する最良の答えは、私がそれを使用することでした（Linuxに含まれています）diff a.txt b.txt | grep'<'

メガホン<a.txtからb.txtへの追加:)はb.txtにそれらがないことを意味します

diff a.txt b.txt | grep'>'これは、a.txtにないものを提供しますが、質問ではありませんでした:)

score -1 · Accepted Answer

それにawkを使用します。テストファイル：

$ cat a.txt
one
two
three
four
four
$ cat b.txt
three
two
one

awk：

$ awk '
NR==FNR {                    # process b.txt  or the first file
    seen[$0]                 # hash words to hash seen
    next                     # next word in b.txt
}                            # process a.txt  or all files after the first
!($0 in seen)' b.txt a.txt   # if word is not hashed to seen, output it

重複が出力されます：

four
four

重複を避けるために、a.txtで新しく出会った各単語をseenハッシュに追加します。

$ awk '
NR==FNR {
    seen[$0]
    next
}
!($0 in seen) {              # if word is not hashed to seen
    seen[$0]                 # hash unseen a.txt words to seen to avoid duplicates 
    print                    # and output it
}' b.txt a.txt

出力：

four

単語リストがコンマで区切られている場合：次のようになります。

$ cat a.txt
four,four,three,three,two,one
five,six
$ cat b.txt
one,two,three

あなたはいくつかの余分なラップ（forループ）をしなければなりません：

awk -F, '                    # comma-separated input
NR==FNR {
    for(i=1;i<=NF;i++)       # loop all comma-separated fields
        seen[$i]
    next
}
{
    for(i=1;i<=NF;i++)
        if(!($i in seen)) {
             seen[$i]        # this time we buffer output (below):
             buffer=buffer (buffer==""?"":",") $i
        }
    if(buffer!="") {         # output unempty buffers after each record in a.txt
        print buffer
        buffer=""
    }
}' b.txt a.txt

今回の出力：

four
five,six

linux - Comparing two files in linux terminal

13 に答える 13

Related

Reference