regex - 正規表現を使用してファイルから読み取った一意の要素を array-Perl にプッシュするには

Question

これが私のファイルです：

  heaven
  heavenly
  heavenns
  abc
  heavenns
  heavennly

私のコードによると、とのみheavennsをheavennlyにプッシュする@myarr必要があり、それらは一度だけ配列に入れる必要があります。どうやってするか？

my $regx = "heavenn\+";
my $tmp=$regx;

$tmp=~ s/[\\]//g;

$regx=$tmp;
print("\nNow regex:", $regx);

my $file  = "myfilename.txt";

my @myarr;
open my $fh, "<", $file;  
while ( my $line = <$fh> ) {
 if ($line =~ /$regx/){
    print $line;
push (@myarr,$line);
}
}

print ("\nMylist:", @myarr); #printing 2 times heavenns and heavennly

score 1 · Accepted Answer

これは Perl であるため、複数の方法があります (TMTOWTDI)。ここにそれらの1つがあります：

#!/usr/bin/env perl
use strict;
use warnings;

my $regex = "heavenn+";
my $rx = qr/$regex/;
print "Regex: $regex\n";

my $file  = "myfilename.txt";
my %list;
my @myarr;
open my $fh, "<", $file or die "Failed to open $file: $?";

while ( my $line = <$fh> )
{
    if ($line =~ $rx)
    {
        print $line;
        $list{$line}++;
    }
}

push @myarr, sort keys %list;

print "Mylist: @myarr\n";

出力例:

Regex: heavenn+
heavenns
heavenns
heavennly
Mylist: heavennly
 heavenns

並べ替えは必要ありません (ただし、データを適切な順序で表示します)。カウントインが0の場合、配列にアイテムを追加でき$list{$line}ます。入力行をチョップして改行を削除できます。等。

特定の単語だけをプッシュしたい場合はどうすればよいでしょうか。たとえば、私のファイルが 1. "heavenns hello" 2. "heavenns hi", "3.heavennly good" の場合。「heavenns」と「heavennly」だけを印刷するにはどうすればよいですか?

次に、単語のみをキャプチャするように手配する必要があります。つまり、正規表現を改良するということです。heavenn単語の先頭が必要で、その後に続くアルファベット文字を気にしないと仮定すると、次のようになります。

#!/usr/bin/env perl
use strict;
use warnings;

my $regex = '\b(heavenn[A-Za-z]*)\b';  # Single quotes necessary!
my $rx = qr/$regex/;
print "Regex: $regex\n";

my $file  = "myfilename.txt";
my %list;
my @myarr;
open my $fh, "<", $file or die "Failed to open $file: $?";

while ( my $line = <$fh> )
{
    if ($line =~ $rx)
    {
        print $line;
        $list{$1}++;
    }
}

push @myarr, sort keys %list;

print "Mylist: @myarr\n";

データファイル：

1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
heaven
heavenly
heavenns
abc
heavenns
heavennly

出力：

Regex: \b(heavenn[A-Za-z]*)\b
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
heavenns
heavenns
heavennly
Mylist: heavennly heavenns

リスト内の名前には改行が含まれていないことに注意してください。

チャットの後

このバージョンは、コマンドラインから正規表現を取得します。スクリプトの呼び出しは次のとおりです。

perl script.pl -p 'regex' [file ...]

コマンドラインでファイルが指定されていない場合は、標準入力から読み取ります (固定の入力ファイル名を使用するよりもはるかに優れています — 大きなマージンがあります)。各行で指定された正規表現が複数回出現するかどうかを検索します。正規表現の前または後に (または両方) で指定された「単語文字」を指定できます\w。

#!/usr/bin/env perl
use strict;
use warnings;
use Getopt::Std;

my %opts;
getopts('p:', \%opts) or die "Usage: $0 [-p 'regex']\n";

my $regex_base = 'heavenn';
#$regex_base = $ARGV[0] if defined $ARGV[0];
$regex_base = $opts{p} if defined $opts{p};

my $regex = '\b(\w*' . ${regex_base} . '\w*)\b';
my $rx = qr/$regex/;
print "Regex: $regex (compiled form: $rx)\n";

my %list;
my @myarr;

while (my $line = <>)
{
    while ($line =~ m/$rx/g)
    {
        print $line;
        $list{$1}++;
        #$line =~ s///;
    }
}

push @myarr, sort keys %list;

print "Matched words: @myarr\n";

入力ファイルが与えられた場合:

1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
An unheavenly host.  Good heavens! It heaves to like a yacht!
heaven
Is it heavens
heavenly
heavenns
abc
heavenns
heavennly

次のような出力を取得できます。

$ perl script.pl -p 'e\w*?ly' myfilename.txt
Regex: \b(\w*e\w*?ly\w*)\b (compiled form: (?^:\b(\w*e\w*?ly\w*)\b))
"3.heavennly good". What to d
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
An unheavenly host.  Good heavens! It heaves to like a yacht!
heavenly
heavennly
Matched words: equally heavenly heavennly heavennnly heavennnnly unheavenly
$ perl script.pl myfilename.txt
Regex: \b(\w*heavenn\w*)\b (compiled form: (?^:\b(\w*heavenn\w*)\b))
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
heavenns
heavenns
heavennly
Matched words: heavennly heavennnly heavennnnly heavenns heavennsy
$

score 1 · Accepted Answer

の特定の値に対して$_、!$seen{$_}++は最初に実行されたときにのみ true になります。

my $regx = qr/heavenn/;

my @matches;
my %seen;
while (<>) {
   chomp;
   push(@mymatches, $_) if /$regx/ && !$seen{$_}++;
}

score 0 · Accepted Answer

単語の最初の出現のみをプッシュしたい場合は、正規表現の後に次をループに追加できます。

# Assumes "my %seen;" is declared outside the loop.
next if $seen{$line}++;

一意性へのその他のアプローチ: Perl 配列で一意の要素を出力するにはどうすればよいですか?

regex - 正規表現を使用してファイルから読み取った一意の要素を array-Perl にプッシュするには

3 に答える 3

チャットの後

Related

Reference