perl - RNA合成を模倣するPerlプログラム

Question

RNA合成プログラムを書くために私のPerlプログラミングの宿題に取り組む方法についての提案を探しています。以下にプログラムの概要をまとめました。具体的には、以下のブロックに関するフィードバックを探しています（簡単に参照できるように番号を付けます）。Andrew JohnsonによるPerlを使ったプログラミングの要素の第6章まで読んだ（素晴らしい本）。また、perlfuncとperlopのポッドページを読みましたが、どこから始めればよいのかわかりません。

プログラムの説明：プログラムは、コマンドラインから入力ファイルを読み取り、それをRNAに変換してから、RNAを大文字の1文字のアミノ酸名の配列に転写する必要があります。

コマンドラインで指定されたファイルを受け入れる

ここでは<>演算子を使用します
ファイルにacgtまたはdieのみが含まれていることを確認してください
```
if ( <> ne [acgt] ) { die "usage: file must only contain nucleotides \n"; }  
```
DNAをRNAに転写します（すべてのAがUに置き換えられ、TがAに置き換えられ、CがGに置き換えられ、GがCに置き換えられます）

これを行う方法がわからない
この文字起こしを取り、「AUG」の最初の出現から始まる3文字の「コドン」に分割します

わかりませんが、ここから％hash変数を開始すると思いますか？
3文字の「コドン」を取り、それらに1文字の記号（大文字の1文字のアミノ酸名）を付けます

を使用してキーに値を割り当てます（ここには70の可能性があるため、どこに保存するか、またはアクセスする方法がわかりません）
ギャップが発生した場合、新しい行が開始され、プロセスが繰り返されます

確かではありませんが、ギャップは3の倍数であると想定できます。
私はこれに正しい方法でアプローチしていますか？メインプログラムを簡素化できる、私が見落としているPerl関数はありますか？

ノート

自己完結型のプログラムである必要があります（コドン名と記号の保存された値）。

プログラムが記号のないコドンを読み取るときはいつでも、これはRNAのギャップであり、新しい出力行を開始し、次に「AUG」が発生したときに開始する必要があります。簡単にするために、ギャップは常に3の倍数であると想定できます。

研究にさらに時間を費やす前に、正しいアプローチを取っていることを確認したいと思っています。読んでいただき、専門知識を共有していただきありがとうございます。

score 5 · Accepted Answer

1. here I will use the <> operator

OK、あなたの計画はファイルを一行ずつ読むことです。chomp進むにつれて各行を忘れないでください。そうしないと、シーケンスに改行文字が含まれることになります。

2. Check to make sure the file only contains acgt or die

if ( <> ne [acgt] ) { die "usage: file must only contain nucleotides \n"; }

whileループでは、明示的に割り当てない限り、<>演算子は読み取った行を特殊変数に入れます（）。$_my $line = <>

上記のコードでは、ファイルから1行を読み取り、それを破棄しています。その行を保存する必要があります。

また、ne演算子は1つの文字列と1つの正規表現ではなく、2つの文字列を比較します。!~ここで演算子（または=~否定された文字クラスを持つ演算子）が必要になり[^acgt]ます。大文字と小文字を区別しないテストが必要な場合は、i正規表現の一致のフラグを調べてください。

3. Transcribe the DNA to RNA (Every A replaced by U, T replaced by A, C replaced by G, G replaced by C).

GWWが言ったように、あなたの生物学をチェックしてください。T->Uは文字起こしの唯一のステップです。ここでは、tr（音訳）演算子が役立ちます。

4. Take this transcription & break it into 3 character 'codons' starting at the first occurance of "AUG"

not sure but I'm thinking this is where I will start a %hash variables?

ここではバッファを使用します。while(<>)ループの外側でスカラーを定義します。index「AUG」と一致させるために使用します。見つからない場合は、最後の2つのベースをそのスカラーに配置します（これに使用できますsubstr $line, -2, 2）。ループの次の反復で、.=これらの2つの塩基に（を使用して）行を追加し、「AUG」を再度テストします。ヒットした場合は、どこにあるかがわかるので、スポットをマークして翻訳を開始できます。

5. Take the 3 character "codons" and give them a single letter Symbol (an uppercase one-letter amino acid name)

Assign a key a value using (there are 70 possibilities here so I'm not sure where to store or how to access)

繰り返しますが、GWWが言ったように、ハッシュテーブルを作成します。

%codons = ( AUG => 'M', ...)。

次に、（たとえば）splitを使用して、調べている現在の行の配列を作成し、一度に3つの要素のコドンを作成し、ハッシュテーブルから正しいアミノ酸コードを取得できます。

6.If a gap is encountered a new line is started and process is repeated

not sure but we can assume that gaps are multiples of threes.

上記を参照。とのギャップの存在をテストできますexists $codons{$current_codon}。

7. Am I approaching this the right way? Is there a Perl function that I'm overlooking that can simplify the main program?

上記を見ると、複雑すぎるようです。私はいくつかのビルディングブロックを構築しました。サブルーチンread_codonとtranslate：プログラムのロジックに非常に役立つと思います。

これが宿題であることは知っていますが、他の可能なアプローチの感触をつかむのに役立つかもしれないと思います。

use warnings; use strict;
use feature 'state';


# read_codon works by using the new [state][1] feature in Perl 5.10
# both @buffer and $handle represent 'state' on this function:
# Both permits abstracting reading codons from processing the file
# line-by-line.
# Once read_colon is called for the first time, both are initialized.
# Since $handle is a state variable, the current file handle position
# is never reset. Similarly, @buffer always holds whatever was left
# from the previous call.
# The base case is that @buffer contains less than 3bp, in which case
# we need to read a new line, remove the "\n" character,
# split it and push the resulting list to the end of the @buffer.
# If we encounter EOF on the $handle, then we have exhausted the file,
# and the @buffer as well, so we 'return' undef.
# otherwise we pick the first 3bp of the @buffer, join them into a string,
# transcribe it and return it.

sub read_codon {
    my ($file) = @_;

    state @buffer;
    open state $handle, '<', $file or die $!;

    if (@buffer < 3) {
        my $new_line = scalar <$handle> or return;
        chomp $new_line;
        push @buffer, split //, $new_line;
    }

    return transcribe(
                       join '', 
                       shift @buffer,
                       shift @buffer,
                       shift @buffer
                     );
}

sub transcribe {
    my ($codon) = @_;
    $codon =~ tr/T/U/;
    return $codon;
}


# translate works by using the new [state][1] feature in Perl 5.10
# the $TRANSLATE state is initialized to 0
# as codons are passed to it, 
# the sub updates the state according to start and stop codons.
# Since $TRANSLATE is a state variable, it is only initialized once,
# (the first time the sub is called)
# If the current state is 'translating',
# then the sub returns the appropriate amino-acid from the %codes table, if any.
# Thus this provides a logical way to the caller of this sub to determine whether
# it should print an amino-acid or not: if not, the sub will return undef.
# %codes could also be a state variable, but since it is not actually a 'state',
# it is initialized once, in a code block visible form the sub,
# but separate from the rest of the program, since it is 'private' to the sub

{
    our %codes = (
        AUG => 'M',
        ...
    );

    sub translate {
        my ($codon) = @_ or return;

        state $TRANSLATE = 0;

        $TRANSLATE = 1 if $codon =~ m/AUG/i;
        $TRANSLATE = 0 if $codon =~ m/U(AA|GA|AG)/i;

        return $codes{$codon} if $TRANSLATE;
    }
}

score 3 · Accepted Answer

私はあなたにあなたのポイントのいくつかについていくつかのヒントを与えることができます。

最初の目標は、ファイルを文字ごとに解析し、それぞれが有効であることを確認し、それらを3つのヌクレオチドのセットにグループ化してから、他の目標に取り組むことだと思います。

あなたの生物学も少しずれていると思います。DNAをRNAに転写するときは、どの鎖が関与しているかを考える必要があります。転写ステップ中に塩基を「補完」する必要はないかもしれません。

2. ファイルを文字ごとに解析するときに、これを確認する必要があります。

3. これは、ループといくつかのifステートメントまたはハッシュを使用して行うことができます

4. これは、ファイルを1文字ずつ読み取るときに、カウンターを使用して実行できる可能性があります。3文字ごとにスペースを挿入する必要があるため。

5. これは、アミノ酸コドン表に基づくハッシュを使用するのに適した場所です。

6. ファイルを解析するときにギャップ文字を探す必要があります。あなたのテキストはATGCしか含めることができないとプログラムが言っているので、これはあなたの＃2の要件と矛盾しているようです。

これを簡単にすることができるperl関数はたくさんあります。bioperlなどのperlモジュールもあります。しかし、これらのいくつかを使用すると、割り当ての目的が損なわれる可能性があると思います。

score 1 · Accepted Answer

BioPerlを見て、ソースモジュールを参照して、それを実行する方法の指標を探してください。

perl - RNA合成を模倣するPerlプログラム

ノート

3 に答える 3

Related

Reference