perl - Perl でリストを並べ替えたインデックスを使用して、別のリストを並べ替えてインデックスを付ける

Question

単語を保持するリストと、それらの単語に関連付けられた信頼を保持する別のリストがあるとします。

my @list = ("word1", "word2", "word3", "word4");
my @confidences = (0.1, 0.9, 0.3, 0.6);

信頼度がソート順@listよりも高かった要素と、それに対応する信頼度を含むリストの 2 番目のペアを取得したいと考えています。0.4Perlでそれを行うにはどうすればよいですか? (つまり、別のリストのソートに使用されるインデックスのリストを使用します)

上記の例では、出力は次のようになります。

my @sorted_and_thresholded_list = ("word2", "word4");
my @sorted_and_thresholded_confidences = (0.9, 0.6);

@list のエントリは一意ではない可能性があります (つまり、並べ替えは安定している必要があります)。
並べ替えは降順でなければなりません。

score 5 · Accepted Answer

並列配列を扱うときは、インデックスを操作する必要があります。

my @sorted_and_thresholded_indexes =
    sort { $confidences[$b] <=> $confidences[$a] }
     grep $confidences[$_] > 0.4,
      0..$#confidences;

my @sorted_and_thresholded_list =
   @list[ @sorted_and_thresholded_indexes ];
my @sorted_and_thresholded_confidences =
   @confidences[ @sorted_and_thresholded_indexes ];

score 3 · Accepted Answer

List::MoreUtilsの使用'pairwiseおよびpart:

use List::MoreUtils qw(pairwise part);
my @list = ("word1", "word2", "word3", "word4");
my @confidences = (0.1, 0.9, 0.3, 0.6);

my $i = 0;
my @ret = part { $i++ % 2 } 
          grep { defined } 
          pairwise { $b > .4 ? ($a, $b) : undef } @list, @confidences;

print Dumper @ret;

出力：

$VAR1 = [
          'word2',
          'word4'
        ];
$VAR2 = [
          '0.9',
          '0.6'
        ];

score 1 · Accepted Answer

重複する単語がないことが確実な場合は、このタスクにハッシュを使用する方がおそらく簡単だと思います。たとえば、次のようになります。

my %hash = ( "word1" => 0.1,
             "word2" => 0.9,
             "word3" => 0.3,
             "word4" => 0.6
           );

次に、ハッシュ内のキーを反復処理して、基準に一致するキーのみを見つけることができます。

foreach my $key (keys %hash) {
    if ($hash{$key} > 0.4) {
        print $key;
    }
}

score 1 · Accepted Answer

池上氏は、インデックスを使用する私の最初の解決策を既に述べていますが、配列を 2 次元配列 (*) に結合するオプションもあります。利点は、データがすべて同じデータ構造に集められるため、簡単に操作できることです。

use strict;
use warnings;
use Data::Dumper;

my @list = ("word1", "word2", "word3", "word4");
my @conf = (0.1, 0.9, 0.3, 0.6);
my @comb;

for (0 .. $#list) {                       # create two-dimensional array
    push @comb, [ $list[$_], $conf[$_] ];
}

my @all = sort { $b->[1] <=> $a->[1] }    # sort according to conf
          grep { $_->[1] > 0.4 } @comb;   # conf limit

my @list_done = map $_->[0], @all;        # break the lists apart again
my @conf_done = map $_->[1], @all;

print Dumper \@all, \@list_done, \@conf_done;

出力：

$VAR1 = [
          [
            'word2',
            '0.9'
          ],
          [
            'word4',
            '0.6'
          ]
        ];
$VAR2 = [
          'word2',
          'word4'
        ];
$VAR3 = [
          '0.9',
          '0.6'
        ];

(*) = ハッシュを使用することもオプションです。1) 元の順序は重要ではなく、2) すべての単語が一意であると仮定します。ただし、迅速なルックアップが問題にならない限り、配列を使用しても問題はありません。

score 1 · Accepted Answer

my @list = ("word1", "word2", "word3", "word4");
my @confidences = (0.1, 0.9, 0.3, 0.6);

my @result = map { $list[$_] }
              sort { $confidences[$b] <=> $confidences[$a] }
                 grep { $confidences[$_] > 0.4 } (0..$#confidences);

perl - Perl でリストを並べ替えたインデックスを使用して、別のリストを並べ替えてインデックスを付ける

5 に答える 5

Related

Reference