r - Rの異なるファイルから重複する位置をプロットします

Question

Startと位置を持つ2つの大きなファイルとEnd、2つのサンプル列（数値）があります。

File 1:

Start  End  Sample1  Sample2
1         60      1       4
100       200     2       1
201       250     1       4
300       450     1       1

File 2:

Start  End  Sample1  Sample2
40         60      1       1
70        180      1       1
240       330      2       1
340       450      1       4
500       900      1       4
980       1200     2       1

まず、最初のファイルから最初のファイルStartとEnd位置を取得して、セグメントプロットを作成します。プロットでは、最初のファイルの各位置も考慮に入れる必要がありStart-20ます。End+20

次に、2番目のファイルからオーバーラップ StartとEnd位置を取得し、上のプロットにプロットします。このようにして、最初のファイルのとの位置に基づいた多くのプロットがあり、オーバーラップのないものも個別にプロットされStartます。End

各セグメントのcolorは、2つのサンプル番号に基づきます（たとえば、両方のファイル1 and 4で、セグメントの色がredである場合、セグメント1 and 1の色がである場合greenなど）。

誰かが私にRでこれのために機能を作る方法を理解させてくれたら本当にありがたいです。

前もって感謝します。

ここに画像の説明を入力してください PS出力用の図面を添付しました。2つの結果のみを示しました。

以下は私が書いたコードですが、エラーが発生します

match.names（clabs、names（xi））のエラー：名前が以前の名前と一致しません

また、dataset1の線分に赤色を指定し、dataset2の線分に緑色を指定する必要があります。以下のコードでどのように実装しますか？

overlap_func <- function(dataset1,dataset2) {

for(i in 1:nrow(dataset1))
 {

 loop_start <- dataset1[i,"Start"]
 loop_end <- dataset1[i,"End"]
 p <- dataset2[,c(1,2)]   
 dataset1_pos <- data.frame(loop_start,loop_end)
 dataset2_filter <- p[p$Start >= (loop_start-(loop_start/2)) & p$End <= (loop_end+ (loop_end/2)), ]
 data_in_loop <- rbind(dataset1_pos,dataset2_filter)
 plot_function(data_in_loop,loop_start,loop_end)

 }
 }


plot_function <- function(loop_data,start,end){ 
 pos <- 1:nrow(loop_data)
 dat1 <- cbind(pos,loop_data)
 colnames(dat1) <- c("pos","start","end")
 pdf(file=paste0("path where plots are generated","_",start,"-",end,"_","overlap.pdf"))
 plot(dat1$pos, type = 'n', xlim = range(c(start-(start/2), end+(end/2))))
 segments(dat1$start, dat1$pos, dat1$end, dat1$pos)
 dev.off()
 }


df1 <- read.table(header=T, text="Start  End  Sample1  Sample2
1         60      1       4
100       200     2       1
201       250     1       4
300       450     1       1")

df2 <- read.table(header=T, text="Start  End  Sample1  Sample2
40         60      1       1
70        180      1       1
240       330      2       1
340       450      1       4
500       900      1       4
980       1200     2       1")

 overlap_func(df1,df2)

score 2 · Accepted Answer

このようなもの？？

df1 <- read.table(header=T, text="Start  End  Sample1  Sample2
1         60      1       4
100       200     2       1
201       250     1       4
300       450     1       1")

df2 <- read.table(header=T, text="Start  End  Sample1  Sample2
40         60      1       1
70        180      1       1
240       330      2       1
340       450      1       4
500       900      1       4
980       1200     2       1")

require(IRanges)
require(ggplot2)
require(plyr)

df1$id <- factor(1:nrow(df1))
ir2 <- IRanges(df2$Start, df2$End)
out <- ddply(df1, .(id), function(x) {
    ir1 <- IRanges(x$Start, x$End)
    o.idx <- as.data.frame(findOverlaps(ir1, ir2))$subjectHits
    df.out <- rbind(x[, 1:4], df2[o.idx, ])
    df.out$id1 <- x$id
    df.out$id2 <- seq_len(nrow(df.out))
    df.out
})
out$id1 <- factor(out$id1)
out$id2 <- factor(out$id2)
out$id3 <- factor(1:nrow(out))

p <- ggplot(out, aes(x = Start, y = id3 , colour = id2)) 
p <- p + geom_segment(aes(xend = End, ystart = id3, yend = id3))
p <- p + scale_colour_brewer(palette = "Set1")
p

gglot2_no_facet_geom_segment

編集:更新された図面を見た後、これがあなたの望みではないでしょうか?

p + facet_wrap( ~ id1, scales="free")

ggplot2_facet_geom_segment

編集：ファセットの各プロットを個別のファイルに保存します。これを行うには、分割して毎回プロットを生成しますid1

d_ply(out, .(id1), function(ww) {
    p <- ggplot(ww, aes(x = Start, y = id3 , colour = id2)) 
    p <- p + geom_segment(aes(xend = End, ystart = id3, yend = id3))
    p <- p + scale_colour_brewer(palette = "Set1")
    fn <- paste0("~/Downloads/id", as.numeric(as.character(ww$id1[1])), ".pdf")
    ggsave(fn, p)
})

それに応じてパスを設定しますfn。

score 1 · Accepted Answer

latticeパッケージを使用してこの問題を解決しようとしました。私は特別に関数Shingleを使用して、比較する間隔を把握します。2つの帯状疱疹を合併できることを望みましたが、できません。したがって、最初のプロットを作成したら、（上記のソリューションのように）IRangesパッケージを使用してオーバーラップを計算しました。アイデアは最終的なものdotplotです。

## I red the input data
dat <- read.table(text = 'Start  End  Sample1  Sample2
1         60      1       4
100       200     2       1
201       250     1       4
300       450     1       1', header = T) 

dat1 <- read.table(text = 'Start  End  Sample1  Sample2
40         60      1       1
70        180      1       1
240       330      2       1
340       450      1       4
500       900      1       4
980       1200     2       1', header = T) 


## I create my 2 shingles
dat.sh <- shingle(x = dat[,3], intervals = dat[,c(1,2)])
dat1.sh <- shingle(x = dat1[,3], intervals = dat1[,c(1,2)])
## compute max value for plot comparison
max.value <- max(c(dat$End,dat1$End))
## I plot the 2 series with differents color
p1<- plot(dat.sh, xlim= c(0,max.value),col = 'red')
p2 <- plot(dat1.sh,xlim= c(0,max.value), col ='green')
library(gridExtra)
grid.arrange(p1,p2)

これは、間隔を比較するための高速な方法です。

ここに画像の説明を入力

これは問題ないように見えますが、同じプロットにマージできないため、シングルをさらに進めることはできません。SOIRangesパッケージを使用してオーバーラップを計算します。

library(IRanges)
rang1 <- IRanges(start=dat[,1], end = dat[,2])
rang2 <- IRanges(start=dat1[,1], end = dat1[,2])
dat.plot     <- dat1                    # use the first data.frame
dat.plot$group <- 'origin'
dat.plot$id <- rownames(dat1)           ## add an Id for each row
rang.o <- findOverlaps(rang2,rang1)     # get overlaps
dat.o <- dat1[rang.o@queryHits,]        ## construct overlaps data.frame 
dat.o$id <- rang.o@subjectHits
dat.o$group <- 'overlap'
dat.plot <- rbind(dat.plot,dat.o)      ## union of all 
dotplot(id ~End-Start|group  , data=dat.plot, 
                               groups = col,type = c("p", "h"))

ここに画像の説明を入力

r - Rの異なるファイルから重複する位置をプロットします

2 に答える 2

Related

Reference