perl - Perl - 複数のキーと 2 つのファイルのマージ

Question

#!/usr/bin/env perl

use strict;
use warnings;
use autodies;

#constants
use constant {
    FILE_A  => '/home/test/input_a.csv',
    FILE_B  => '/home/test/input_b.csv',
    FILE_C  => '/home/test/output.csv',
};

my %a_hash;

my @a_array;
my @b_array;
my @c_array;

open my $c_fh, "<", FILE_C;
open my $a_fh, "<", FILE_A;

while ( my $line = <$a_fh> ) {
    chomp $line;
    $line = /^(.+?);/; 
    $a_hash{$1} = 1; 
    print $c_fh, $line . "\n"; #populate a.csv into c.csv
}
close $a_fh;

open my $b_fh, "<", FILE_B;
#reading b.txt
while ( my $line = <$b_fh> ) {
    chomp $line;
    #your suggestion added
    $line = /^(.+?);/;
    if ( not exists $a_hash{$1} ) {
        print $c_fh, $line . "\n"; #populate a.csv into c.csv
    }
}
close $b_fh;
close $c_fh;

エラーメッセージ：

Use of uninitialized value $_ in pattern match (m//) at ./test.pl line 34, <$b_fh> line 1.
Use of uninitialized value $1 in exists at ./test.pl line 35, <$b_fh> line 1.

score 0 · Accepted Answer

ハッシュは、ファイルをインデックス化するための優れた方法です。あなたが言った：

a.txt から c.txt へのすべての行が必要です。しかし、b.txt から c.txt への行を選択する際には、まず a.txt を調べる必要があります。行が a.txt に既に存在する場合、c.txt [出力] に書き込むときにその b.txt 行を考慮する必要はありません。

これは、実際にはの行にインデックスを付けるだけでよいことを意味しますa.txt。a.txtまた、ファイルをにマージする方法についても言及していませんc.txt。各ファイルから 1 行で読み取りますか? 最終出力はソートされていると思われますか?

そして、一致する行とはどういう意味ですか? 行全体が一致するということですか、それとも最初のセミコロンまで一致するということですか?

物事を柔軟に保つために、すべての行をさまざまな配列に読み込み、そこから物事を整理できるようにします。

a.txt を配列とハッシュに読み込み、インデックスを作成します。
b.txt を別の配列に読み込みますが、ハッシュインデックスにある行はスキップします。
c.txt をさらに別の配列に読み込みます。
これらの配列でやりたいことができ、好きなように行をマージできます。

プログラムは次のとおりです。

#! /usr/bin/env perl

# Preliminary stuff. The first two are always a must
use strict;
use warnings;
use autodies;   # No need to test on read/write or open/close failures

# This is how Perl defines constants. It's not great
# And unlike variables, they don't easily interpolate in 
# strings. But, this is what is native to Perl. There are 
# optional modules like "Readonly" that do a better job.
use constant {
    FILE_A  => 'a.txt',
    FILE_B  => 'b.txt',
    FILE_C  => 'c.txt',
};

# I'll use this for indexing
my %a_hash;

# I'll put the file contents in these three arrays
my @a_array;
my @b_array;
my @c_array;

open my $a_fh, "<", FILE_A;

# I'm reading each line of FILE_A. As I read it,
# I'll get the first field and put that as an index
# to my hash

while ( my $line ~= <$a_fh> ) {
    chomp $line;
    $line = /^(.+?);/;     # This strips the first field from the line
    $a_hash{$1} = 1;       # Now, I'll use the first field as my index to my hash
    push @a_array, $line;  # This adds the line to the end of the array
}
close $a_fh;

# I'll do the same for FILE_B as I did for FILE_A
# I'll go through line by line and push them into @b_array.
# One slight difference. I'll pull out the first field in
# my line, and see if it exists in my %a_hash where I indexed
# the lines in FILE_A. If that line does not exist in my %a_hash
# index, I'll push it into my @b_array

open my $b_fh, "<", FILE_B;
while ( my $line = <$b_fh> ) {
    $line ~= /^(.+?);/;
    if ( not exists $a_hash{$1} ) {
        push @b_array, $line;
    }
}
close $b_fh;

# Now, I'll toss all the lines in FILE_C into @c_array
# I can do a bit of a shortcut because I don't process
# the lines. I'll just put the whole file into @c_array
# in one fell swoop. I can use "chomp" to remove the NL
# from the end of each item of @c_array in a single line.

open my $c_fh, "<", FILE_C;
@c_array = <$c_fh>;
chomp @c_array;
close $c_fh;

# At this point, @a_array contains the entire contents of FILE_A
# in the order of that file. @c_array also contains all the lines in
# FILE_C in the order of that file. @b_array is a bit different, it
# also contains all of the lines in FILE_B **except for those lines
# whose first column were already in FILE_A.
# 
# You don't specify exactly what you want done at this point. Do
# you want to combine @a_array with @b_array? Here's how we can do
# that:

my @combined_array = sort (@a_array, @b_array);

これで、3 つのファイルを表す 3 つの配列ができました。これら 3 つの配列の順序は、ファイルと同じです。

@a_arrayおよび@c_arrayにあったすべての行がそれぞれ含まれa.txtますc.txt。には含まれていたがには含まれていなかった@b_arrayすべての行が含まれています。b.txta.txt

これで、これら 3 つの配列を取得して、マージしたい方法でマージできます。

perl - Perl - 複数のキーと 2 つのファイルのマージ

1 に答える 1

Related

Reference