regex - 2 つの .csv ファイル間の交線を取得するには?

Question

2 つの .csv ファイル間の交線を取得するには? 私の質問は理解するのが非常に難しいようですが、次のような例を挙げます。

2 つの .csv ファイルがあります。

+---+-------+----+ +-----------------+
| | あ | ビ | シー | | | あ | ビ | シー |
+----+-------+----+ +----+-------+----+
| | 1 | アリ | 14 |* | 6 | ファン | ファン | 12 |
| | 2 | 鳥 | 11 | | | 7 | 銃 | 55 |*
| | 3 | 猫 | 猫 | 21 |* | 8 | 馬 | 馬 | 21 |*
| | 4 | 犬 | 犬 | 55 |* | 9 | 氷 | 15 |
| | 5 | 卵 | 99 | | | 10 | 瓶 | 14 |*
+---+-------+---+ +----+-------+----+
     表 1 表 2

したがって、表 1 を使用して表 2 をフィルター処理すると、次のような出力が得られます。

+----------------+
| | あ | ビ | シー |
+---+-------+----+     
| | 7 | 銃 | 55 |*     
| | 8 | 馬 | 馬 | 21 |*    
| | 10| 瓶 | 14 |*       
+---+-------+----+     
    表 3

はい、表 1 の最後の列を使用して表 2 をフィルター処理します

どのツールでもこのようにフィルタリングするにはどうすればよいですか?

score 1 · Accepted Answer

これはうまくいくかもしれません（GNU sed）：

sed -r 's/(\S+\s?){3}/\/(^\\S+\\s){2}\1$\/p/' file1.csv | sed -nrf - file2.csv

スペースまたはタブで区切られたファイルの場合。

コンマ区切りのファイルの場合:

sed -r 's/([^,]+,?){3}/\/(^[^,]+,){2}\1$\/p/' file1.csv | sed -nrf - file2.csv

これは、最初のテーブルから sed スクリプトを作成し、それを使用して 2 番目のテーブルに対してフィルター処理することによって機能します。

score 1 · Accepted Answer

上記のコメントに従って、これが私がしたことです：

Create table T1 (A INT, B VARCHAR(100), C INT);

Create table T2 (A INT, B VARCHAR(100), C INT);

Insert into T1 Values (1, 'ant',14);
Insert into T1 Values (2, 'bird',11);
Insert into T1 Values (3, 'cat',21);
Insert into T1 Values (4, 'dog',55);
Insert into T1 Values (5, 'egg',99);

Insert into T2 Values (6, 'fan',12);
Insert into T2 Values (7, 'gun',55);
Insert into T2 Values (8, 'horse',21);
Insert into T2 Values (9, 'ice',15);
Insert into T2 Values (10, 'jar',14);

テーブルに既にデータがあるかどうかはわかりませんが、csv ファイルをデータベースにインポートするツールがあります。

クエリを頻繁に実行する場合は、各テーブルの列 A にインデックスを作成すると、プロセスが高速化されます。この単純なケースでは、インデックスを作成しませんでした。

結果を得るために必要な選択は次のとおりです。

select * from t2,t1 where t2.c = t1.c order by t2.a

結果に満足している場合は、次のようなテーブルに入れることができます (SQL Server)

SELECT T2.A, T2.B , T2.C INTO TEST FROM t2,t1 where t2.c = t1.c order by t2.a

これがあなたの望むものであることを願っています...

score 0 · Accepted Answer

これは、必要な作業を行うためのPerlのスクリプトです。

最初のファイルをスキャンし、3 番目の列の値をメモリに保持することで機能します。次に、2 番目のファイルをスキャンし、読み取った行ごとに 3 番目の列の値をメモリ内の値と比較し、一致する場合はその行を出力します。

#!/usr/bin/perl
use warnings;
use strict;
use 5.010;

my %seen;

open my $file1_fh, '<', 'file1.txt' 
    or die "Can't open file1.txt $!";

while (<$file1_fh>) {
    chomp;
    $seen{ (split)[2] } = 1; #assumes line are delimited by whitespace.
}

close $file1_fh;

open my $file2_fh, '<', 'file2.txt'
    or die "Can't open file2.txt $!";


while (<$file2_fh>) {
    chomp;
    my $third_column_value = (split)[2]; #assumes line are delimited by whitespace.
    say if $seen{ $third_column_value };
}

close $file2_fh;

__END__

#OUTPUT
7 gun 55
8 horse 21
10 jar 14

regex - 2 つの .csv ファイル間の交線を取得するには?

4 に答える 4

Related

Reference