r - biomaRtで転写開始部位を見つける

Question

私はbiomaRtRで人間の遺伝子のensemblのhsapiensデータベースを照会するために使用しています。この関数getBMを使用して、すべての遺伝子の名前、開始位置、および停止位置を取得していますが、TSS（転写開始サイト）を取得するための適切な属性が見つかりません。と同じだと思われるからなのseqType= c("3utr", "5utr")か？

score 7 · Accepted Answer

クエリ可能な属性の完全なリストは、を使用してデータフレームで取得できますlistAttributes。次に、必要な属性を検索するだけです。

mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
att <- listAttributes(mart)
grep("transcript", att$name, value=TRUE)

このように始まる、かなり長いリストが表示されます

 [1] "ensembl_transcript_id"                                        
 [2] "transcript_start"                                             
 [3] "transcript_end"                                               
 [4] "external_transcript_id"                                       
 [5] "transcript_db_name"                                           
 [6] "transcript_count"                                             
 [7] "transcript_biotype"                                           
 [8] "transcript_status"                                            
 [9] "clone_based_ensembl_transcript_name"                          
[10] "clone_based_vega_transcript_name"

次に、これらの名前を使用してクエリを実行できます

getBM(attributes=c("transcript_start", "transcript_end"),
      filters="hgnc_symbol", values="foxp2", mart=mart)

そしてあなたは得る

   transcript_start transcript_end
1         113726382      114330960
2         113726494      114271639
3         113726615      114330155
4         113728221      114066565
5         113728221      114271650
6         114054329      114330218
7         114055052      114139783
8         114055052      114333827
9         114055110      114330155
10        114055113      114330200
11        114055275      114269037
12        114055374      114285885
13        114055378      114330012
14        114066555      114294198
15        114066557      114271754
16        114066557      114282629
17        114066570      114294198
18        114055052      114333823
19        114268613      114329981
20        113726615      114310038

すべての遺伝子のすべての転写産物が必要な場合は、引数と引数を削除しますが、大量のデータが取得されることに注意してください。filtervalues

score 0 · Accepted Answer

I believe the "transcript_start" and "transcript_end" are the translation start and stop site, but not necessarily the TSS (transcription start site).

Looking at the "start_position" and "end_position" attributes, these seem to be the TSS (start_position for + strand and end_position for - strand), because they are always the smallest number of the transcript_start options for different transcript for the + strand and the largest number of the transcript_end options for the - strand.

r - biomaRtで転写開始部位を見つける

3 に答える 3

Related

Reference