0

皆さん、こんにちは。blat ツールからのデータがいくつかあります。これにより、次のようなアライメントの出力が得られます。

contig30
chromosome 1
000000001 gctctgc.tctggggacgctcgcagcgctcggcgcctggcccag 000000043
>>>>>>>>> ||||||| |||||||||||||||||||||||||||||||||||| >>>>>>>>>
123368567 gctctgcatctggggacgctcgcagcgctcggcgcctggcccag 123368610

000000044 tttctttgacaatgtctaccgttcatgaaattctgtgcaagctcagcttg 000000093
>>>>>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>>>>>
123380141 tttctttgacaatgtctaccgttcatgaaattctgtgcaagctcagcttg 123380190

contig35
chromosome 1
000000001 gctctgc.tctggggacgctcgcagcgctcggcgcctggcccag 000000043
>>>>>>>>> ||||||| |||||||||||||||||||||||||||||||||||| >>>>>>>>>
123368567 gctctgcatctggggacgctcgcagcgctcggcgcctggcccag 123368610

このデータのテキスト ファイルがあります。

私がやりたいことは、次の方法で出力を印刷することです。

contig 30 chromosome 1 000000001-123368567
contig 30 chromosome 1 000000002-123368568
contig 30 chromosome 1 000000003-123368569

 -
 -
 upto
 contig 30 chromosome 1 000000093-123380190

次のエントリも同様です。入力テキスト ファイルにこのタイプのエントリが複数あります。

4

2 に答える 2

1

May you look for something like this:

#!/usr/bin/env perl

use strict;
use warnings;
use utf8;

my $content = do {
    local $/;
    <DATA>
};

while (
    $content =~ /   
                        (contig)(30)\n
                        (chromosome\ 1)\n
                        (\d+).*\n
                        .*\n
                        (\d+).*\n

                    /gmx
  )
{
    print $1, " ", $2, " ", $3, " ", $4, "-", $5, "\n";
}

__DATA__
contig30
chromosome 1
000000001 gctctgc.tctggggacgctcgcagcgctcggcgcctggcccag 000000043
>>>>>>>>> ||||||| |||||||||||||||||||||||||||||||||||| >>>>>>>>>
123368567 gctctgcatctggggacgctcgcagcgctcggcgcctggcccag 123368610

000000044 tttctttgacaatgtctaccgttcatgaaattctgtgcaagctcagcttg 000000093
>>>>>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>>>>>
123380141 tttctttgacaatgtctaccgttcatgaaattctgtgcaagctcagcttg 123380190

contig35
chromosome 1
000000001 gctctgc.tctggggacgctcgcagcgctcggcgcctggcccag 000000043
>>>>>>>>> ||||||| |||||||||||||||||||||||||||||||||||| >>>>>>>>>
123368567 gctctgcatctggggacgctcgcagcgctcggcgcctggcccag 123368610

contig30
chromosome 1
000000002 gctctgc.tctggggacgctcgcagcgctcggcgcctggcccag 000000043
>>>>>>>>> ||||||| |||||||||||||||||||||||||||||||||||| >>>>>>>>>
123368568 gctctgcatctggggacgctcgcagcgctcggcgcctggcccag 123368610

The important thing here is, that DATA is slurped into $content. That means the whole content of the file is stored into $content including all the newlines etc.. In order to

With the file mirrored into the variable you can perform a multiline search on it. In order to inform perl to do so you have to add the m modifier to the regex (the x modifier is added to improve the legibility of the regex by depicting the underlying structure of the pattern).

于 2013-08-02T08:46:35.687 に答える
0

これは仕事をしているようです:

#!/usr/bin/env perl
use strict;
use warnings;

my $contig;
my $chromo;

while (<>)
{
    chomp;
    if (/^contig(\d+)/)
    {
        $contig = $1;
    }
    elsif (/^chromosome (\d+)/)
    {
        $chromo = $1;
    }
    elsif (/^(\d+) [acgt.]+ (\d+)/)
    {
        my $b1 = $1;
        my $e1 = $2;
        my $junk = <>;
        my $line = <>;
        next unless $junk =~ m/^[<>]+ [ |]+ [<>]+$/; # See other question
        my($b2, $e2) = $line =~ m/^(\d+) [acgt.]+ (\d+)/;
        for (my $i = 0; $i < $e1 - $b1; $i++)
        {
            printf "contig %d chromosome %d %.9d-%.9d\n", $contig, $chromo, $b1+$i, $b2+$i;
        }
    }
    # else blank?  Ignore it, anyway.
}

他の質問のデータは行の先頭では<なく>、パターンを使用し[<>]てフィラー行の先頭と末尾を一致させました。この設問の中段にも空欄があります。

出力例:

contig 30 chromosome 1 000000001-123368567
contig 30 chromosome 1 000000002-123368568
contig 30 chromosome 1 000000003-123368569
...
contig 30 chromosome 1 000000040-123368606
contig 30 chromosome 1 000000041-123368607
contig 30 chromosome 1 000000042-123368608
contig 30 chromosome 1 000000044-123380141
contig 30 chromosome 1 000000045-123380142
contig 30 chromosome 1 000000046-123380143
...
contig 35 chromosome 1 000000001-123368567
contig 35 chromosome 1 000000002-123368568
contig 35 chromosome 1 000000003-123368569
...
contig 35 chromosome 1 000000040-123368606
contig 35 chromosome 1 000000041-123368607
contig 35 chromosome 1 000000042-123368608
于 2013-08-02T09:32:23.190 に答える