perl - 他のファイルの最初の行を参照して、1 つのファイルから 2 つの行のセットを取得するにはどうすればよいでしょうか?

Question

私は2つのファイルを持っています

$cat ファイル 1
索引1 注釈1
あいうえお
インデックス 2 注釈 2
エフ
索引3 注釈3
ハイク
索引4 注釈4
lmno
索引5 注釈5
pqrs
…

$猫ファイル2
インデックス1
索引3
インデックス5

私が取得したいのは、ファイル1からの行のリストと、次のように取得された各行の後の行です。

索引1 注釈1
あいうえお
索引3 注釈3
ハイク
索引5 注釈5
pqrs

私の現在の解決策は、grepとその「ファイル」フラグを使用することです grep -A 1 --file="file2" file1 | awk '!/--/'

しかし、これに対するよりエレガントな解決策があるかどうか疑問に思っていました。ファイルが巨大な場合、現在のソリューションでは時間がかかります

score 2 · Accepted Answer

file1 を読んで、各ラベルがファイル内のどこに表示されるかのインデックスを構築することをお勧めします。必要なデータのラベルは file2 から読み取ることができ、対応する情報を読み取る場所を確認するためにインデックスを参照できます。

このプログラムは原理を示しています。ラベルと残りのテストを区別する方法は明確ではありません。おそらく間違ってIndexいますが、実際のデータに適応させるのに助けが必要な場合は、もう一度質問してください。

use strict;
use warnings;

@ARGV = qw/ file1.txt file2.txt / unless @ARGV;
my ($file1, $file2) = @ARGV;

my %index;

open my $f1, '<', $file1 or die qq(Unable to open "$file1": $!);
my $pos = tell $f1;
while (<$f1>) {
  $index{$1} = $pos if /^(Index\S+)/;
  $pos = tell $f1;
}

open my $f2, '<', $file2 or die qq(Unable to open "$file2": $!);
while (<$f2>) {
  next unless /^(Index\S+)/ and defined($pos = $index{$1});
  seek $f1, $pos, 0;
  print scalar <$f1>, scalar <$f1>;
}

出力

Index1 annotation1
abcd
Index3 annotation3
hijk
Index5 annotation5
pqrs

score 2 · Accepted Answer

#!/usr/bin/env perl

use strict; use warnings;
use autodie;

my %to_index;

my ($annotations_file, $index_file) = @ARGV;

open my $index, '<', $index_file;

while (my $line = <$index>) {
    next unless $line =~ /\S/;
    chomp $line;
    $to_index{ $line } = undef;
}

close $index;

open my $annotations, '<', $annotations_file;

while (my $line = <$annotations>) {
    next unless $line =~ /\S/;
    my ($keyword) = ($line =~ /^(\S+)/);
    if (exists $to_index{ $keyword }) {
        print $line;
        print scalar <$annotations>;
    }
}

close $annotations;

perl - 他のファイルの最初の行を参照して、1 つのファイルから 2 つの行のセットを取得するにはどうすればよいでしょうか?

2 に答える 2

Related

Reference