perl - 大規模なデータセットから NAN のみの行を削除する

Question

大きなデータセット (12,000 行 X 14 列) があります。以下のように最初の4行：

x1  y1  0.02    NAN NAN NAN NAN NAN NAN 0.004   NAN NAN NAN NAN
x2  y2  NAN 0.003   NAN 10  NAN 0.03    NAN 0.004   NAN NAN NAN NAN
x3  y3  NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN
x4  y4  NAN 0.004   NAN NAN NAN NAN 10  NAN NAN 30  NAN 0.004

列 3 ～ 14 の「NAN」を含む行を削除してから、残りのデータセットを出力する必要があります。次のコードを書きました。

#!usr/bin/perl

use warnings;
use strict;
use diagnostics;

open(IN, "<", "file1.txt") or die "Can't open file for reading:$!";

open(OUT, ">", "file2.txt") or die "Can't open file for writing:$!";

my $header = <IN>;
print OUT $header;

my $at_line = 0;

my $col3;
my $col4;
my $col5;
my $col6;
my $col7;
my $col8;
my $col9;
my $col10;
my $col11;
my $col13;
my $col14;
my $col15;

while (<IN>){
chomp;
my @sections = split(/\t/);

$col3 = $sections[2];
$col4 = $sections[3];;
$col5 = $sections[4];
$col6 = $sections[5];
$col7 = $sections[6];
$col8 = $sections[7];
$col9 = $sections[8];
$col10 = $sections[9];
$col11 = $sections[10];
$col13 = $sections[11];
$col14 = $sections[12];
$col15 = $sections[13];

if ($col3 eq "NAN" && $col4 eq "NAN" && $col5 eq "NAN" && $col6 eq "NAN" && $col7 eq "NAN" && $col8 eq "NAN" && $col9 eq "NAN" && $col10 eq "NAN" 
&& $col11 eq "NAN" && $col12 eq "NAN" && $col13 eq "NAN" && $col14 eq "NAN" && $col5 eq "NAN"){
    $at_line = $.;
    }   
    else {
        print OUT "$_\n";
    }
}

close(IN);
close(OUT);

このコードを実行すると、次のエラーが発生しました。

Use of uninitialized value $col3 in string eq at filter.pl
    line 46, <IN> line 2 (#1)

このプログラムを機能させるにはどうすればよいですか? ありがとう。

score 4 · Accepted Answer

Zaid のワンライナーは、特定のケースに最適なソリューションです。一般に、この多くのスカラーを定義する代わりに、パターンはむしろ

my @required_columns = (split /\s+/)[2..13]

データセットがスペースで区切られているときにタブを分割しているという事実が原因で、エラーが発生しているようです。split文字列ではなく正規表現を取ることを忘れないでください。

score 4 · Accepted Answer

4

一発ギャグ：

$ perl -lane 'print if join("", @F[2..13]) ne "NAN" x 12' <file1.txt >file2.txt

于 2013-09-09T10:13:13.193 に答える

perl - 大規模なデータセットから NAN のみの行を削除する

3 に答える 3

Related

Reference