perl - Perl を介してデータセットから目的の列を抽出する

Question

file1 (sample.txt) には、サンプル ID (約 1000) のリストが含まれる 2 つのファイルがあります。これらのサンプル ID は、file2 (sampleValue.txt) の列名です。file2 は 30000*1500 のデータマトリックスです。1,2,5,6,70,71,75,100,112,114 などの 1500 列のうち 1000 列のすべての行の値に興味があります。列にパターンはありません。だから、これが私がやっていることであり、どうすればそれを改善できるか知りたい. これが私のコードです：

## Opening first file
open my $IN, "sample.txt" or die $!;
my $header = <$IN>;

while(<$IN>){
chomp $_;
my @line = split('\t', $_);
$sampleID{$line[0]} = 1; ## Sample ID
}
close($IN);
print "Total number of sample ID: ", scalar(keys %sampleID),"\n"; ## 1000 columns

## Sample Value Data
open $IN, "sampleValue.txt" or die $!;

## Columns are sample names from file1
$header = <$IN>;
my @samples = split("\t", $header); ## 
print "Total samples: ",scalar(@samples),"\n"; ## 1500

## loop for all the samples ids or the columns I am interested in
for(my $i = 1; $i <= $#samples; $i++){ ## bcos the first instance is called header of the column 1
my $sample = $samples[$i];
$sampleValue{$sample} = $i if (exists $sampleID{$sample});
}

my $col = "";  
foreach my $key (keys %sampleValue){
$col = $sampleValue{$key}.",".$col;
}
chop($col);
print $col,"\n"; ## string of all the columns I am interested in

上記のループを実行する理由は、ファイルを行ごとに読み取っているときに、ハッシュを介して関心のある列を探したくないためです。

## Reading the sample Value file row by row
while(<$IN>){
chomp $_;
print $_,"\n";
my @line = split("\t", $_);
@line = @line[$col]; ## error since it is string type
print @line,"\n";
}

@line = @line[$col];$line は数値ではなく文字列であるため、行のエラーが発生しています。しかし、そうすればうまくいきます@line[1,2,5,6,70,71,75,100,112,114]。だから、私の質問は、文字列$colをコンマを含む数値列に変換する簡単な方法があるか、それとも目的の列を取得するより良い方法があるかどうかです。

perl - Perl を介してデータセットから目的の列を抽出する

0 に答える 0

Related

Reference