1

ファイルを読み取り、チェーン(M、N、O、..)の数に応じて多数の配列を生成したいと思います。

以下はファイルの一部です。

SEQRES   1 M  312  ALA ALA ASP PRO LYS LEU LEU LYS ALA ALA ALA GLU ALA
SEQRES   2 M  312  SER TYR ALA PHE ALA LYS GLU VAL ASP TRP ASN ASN GLY
SEQRES   3 M  312  ILE PHE LEU GLN ALA PRO GLY LYS LEU GLN PRO LEU GLU
SEQRES   4 M  312  ALA LEU LYS ALA ILE ASP LYS MET ILE VAL MET GLY ALA
SEQRES   5 M  213  SER PHE ASN ARG ASN

SEQRES   1 N  312  ASP GLU ILE GLY ASP ALA ALA LYS LYS LEU GLY ASP ALA
SEQRES   2 N  312  SER TYR ALA PHE ALA LYS GLU VAL ASP TRP ASN ASN GLY
SEQRES   3 N  312  ILE PHE LEU GLN ALA PRO GLY LYS LEU GLN PRO LEU GLU
SEQRES   4 N  312  ALA LEU LYS ALA ILE ASP LYS MET ILE VAL MET GLY ALA
SEQRES   5 N  312  ALA ALA ASP PRO LYS LEU LEU LYS ALA ALA ALA GLU ALA
SEQRES   6 N  312  VAL THR SER ARG ALA ASP TRP ASP ASN VAL

SEQRES   1 O  312  HIS HIS LYS ALA ILE GLY SER ILE SER GLY PRO ASN GLY
SEQRES   2 O  312  SER TYR ALA PHE ALA LYS GLU VAL ASP TRP ASN ASN GLY
SEQRES   3 O  312  ILE PHE LEU GLN ALA PRO GLY LYS LEU GLN PRO LEU GLU
SEQRES   4 O  312  ALA LEU LYS ALA ILE ASP LYS MET ILE VAL

これは私のコードです:

my @seq;
my $string="";
my @seqFile;
my $file=<>;
open(FILE, "$file");
while (my $line=<FILE>){
    if ($line =~ /^SEQRES/) {
        chomp $line;
        push @seq, [split (/\s+/, $line)] ;
    }
}
close(FILE);
for my $i (0..$#seq) {
    my $ob =$seq[$i][2];
    if ($seq[$i][2] eq $ob ){
        for (my $j=4;$j<=$#{$seq[$i]};$j++) {
            my $temp= $seq[$i][$j];
            $string .= $temp;
        }
        $ob = $seq[$i][2];
        last;
    }
    push @seqFile, $ob;
    push @seqFile, $string;
    $string = ''; #string needs to be empty to store new lines
}

上記のサンプルの場合:3つの配列M(:) ALAALAASP:....、N(:) ASPGLU ..、O(:) HISHISLYS .. ..

すべてのSEQRESを1つの文字列で作成できましたが、それは私が望んでいたことではありません。

どこかにとを入れif(){}て確認する必要がM <=> NありN <=> O、違います。次に、文字列を保存して、文字列と配列を開始します。しかし、それは$#seqと同じ文字列を何度も蓄積し続けます。または、1つの位置を移動すると、}何も保存されません。エラーメッセージが表示されます。これどうやってするの?

4

2 に答える 2

2

Do you not see a problem here?

my $ob =$seq[$i][2];
if ($seq[$i][2] ne $ob ){

This is analogous to:

my $x = "this";
if ($x ne "this) {

How could the if condition ever be true?

A better approach would be to use a hash of arrays, keyed on M, N, or O, (what you are setting $ob to):

open (my $fh, '<', $file);   # using global globs like FILE is depreciated
my %hash_of_arrays;
while (<$fh>) {
    my @data = split;
    push @{$hash_of_arrays{$data[2]}}, join('', (@data)[4..$#data]);
}

Pretty sure that is close to what you are trying to do; the 2nd arg to push uses an array slice.

Note that if @{$hash{$data[2]}} does not exist yet, it will be created via autovivification: http://en.wikipedia.org/wiki/Autovivification

于 2012-05-28T11:44:03.403 に答える
1

I think this program does what you need.

Instead of watching for changes in the value of the third field I have written it so that a blank line or the end of the file marks the end of a chain.

use strict;
use warnings;

my $file = 'seq.txt';

open my $fh, '<', $file or die $!;

my @seqFile;
my $string;
my $ob;

while (<$fh>) {
  if (/^SEQRES/) {           
    my @data = split;
    $string .= join '', @data[4..$#data];
    $ob = $data[2];
  }
  if (eof($fh) or not /\S/) {
    push @seqFile, $ob, $string;
    $ob = $string = undef;
  }
}

use Data::Dumper;
print Dumper \@seqFile;

output

$VAR1 = [
          'M',
          'ALAALAASPPROLYSLEULEULYSALAALAALAGLUALASERTYRALAPHEALALYSGLUVALASPTRPASNASNGLYILEPHELEUGLNALAPROGLYLYSLEUGLNPROLEUGLUALALEULYSALAILEASPLYSMETILEVALMETGLYALASERPHEASNARGASN',
          'N',
          'ASPGLUILEGLYASPALAALALYSLYSLEUGLYASPALASERTYRALAPHEALALYSGLUVALASPTRPASNASNGLYILEPHELEUGLNALAPROGLYLYSLEUGLNPROLEUGLUALALEULYSALAILEASPLYSMETILEVALMETGLYALAALAALAASPPROLYSLEULEULYSALAALAALAGLUALAVALTHRSERARGALAASPTRPASPASNVAL',
          'O',
          'HISHISLYSALAILEGLYSERILESERGLYPROASNGLYSERTYRALAPHEALALYSGLUVALASPTRPASNASNGLYILEPHELEUGLNALAPROGLYLYSLEUGLNPROLEUGLUALALEULYSALAILEASPLYSMETILEVAL'
        ];

Edit

Now that I know the data file has no blank lines to delineate the chains, my original solution won't work.

This alternative checks the sequence number in the second field of the records, and starts a new chain when that number is 1. The accumulated chain must also be saved whenever a new chain starts and also at the end of the file after the read loop exits.

The output from this program is identical to that shown above.

use strict;
use warnings;

my $file = 'seq.txt';

open my $fh, '<', $file or die $!;

my @seqFile;
my $chain;
my $ob;

while (<$fh>) {

  next unless /^SEQRES/;

  my @data = split;
  if ($data[1] == 1) {
    push @seqFile, $ob, $chain if $chain;
    $ob = $chain = undef;
  }
  $chain .= join '', @data[4..$#data];
  $ob = $data[2];
}

push @seqFile, $ob, $chain if $chain;

use Data::Dumper;
print Dumper \@seqFile;
于 2012-05-28T15:50:08.363 に答える