r - rデータフレームの特定の行の上下の行を返す

Question

任意のデータフレームを検討してください

            col1   col2    col3   col4
row.name11    A     23      x       y
row.name12    A     29      x       y
row.name13    B     17      x       y
row.name14    A     77      x       y

このデータフレームから返したい行名のリストがあります。リストにrow.name12とrow.name13があるとしましょう。これらの行をデータフレームから簡単に返すことができます。しかし、これらの行の上 4 行と下 4 行も返したいと思います。これは、row.name8 から row.name17 に戻りたいということです。grep -A -Bシェルに似ていると思います。

考えられる解決策 - 行名で行番号を返す方法はありますか? 行番号があれば、簡単に 4 を引いて行番号に 4 を足して行を返すことができるからです。

注: ここでの行名は単なる例です。行名は、RED、BLUE、BLACK などのようになります。

score 10 · Accepted Answer

それを試してください：

extract.with.context <- function(x, rows, after = 0, before = 0) {

  match.idx  <- which(rownames(x) %in% rows)
  span       <- seq(from = -before, to = after)
  extend.idx <- c(outer(match.idx, span, `+`))
  extend.idx <- Filter(function(i) i > 0 & i <= nrow(x), extend.idx)
  extend.idx <- sort(unique(extend.idx))

  return(x[extend.idx, , drop = FALSE])
}

dat <- data.frame(x = 1:26, row.names = letters)
extract.with.context(dat, c("a", "b", "j", "y"), after = 3, before = 1)
#    x
# a  1
# b  2
# c  3
# d  4
# e  5
# i  9
# j 10
# k 11
# l 12
# m 13
# x 24
# y 25
# z 26

score 6 · Accepted Answer

おそらく、との組み合わせがwhich()あなた%in%を助けます：

dat[which(rownames(dat) %in% c("row.name13")) + c(-1, 1), ]
#            col1 col2 col3 col4
# row.name12    A   29    x    y
# row.name14    A   77    x    y

上記では、「dat」のどの行名が「row.name13」（を使用which()）であるかを識別しようとしており、 + c(-1, 1)Rに前の行と後の行を返すように指示しています。行を含めたい場合は、のようなことを行うことができます+ c(-1:1)。

行の範囲を取得するには、コンマをコロンに切り替えます。

dat[which(rownames(dat) %in% c("row.name13")) + c(-1:1), ]
#            col1 col2 col3 col4
# row.name12    A   29    x    y
# row.name13    B   17    x    y
# row.name14    A   77    x    y

アップデート

リストの照合は少し注意が必要ですが、あまり考えずに、次のような可能性があります。

myRows <- c("row.name12", "row.name13")
rowRanges <- lapply(which(rownames(dat) %in% myRows), function(x) x + c(-1:1))
# [[1]]
# [1] 1 2 3
# 
# [[2]]
# [1] 2 3 4
#
lapply(rowRanges, function(x) dat[x, ])
# [[1]]
#            col1 col2 col3 col4
# row.name11    A   23    x    y
# row.name12    A   29    x    y
# row.name13    B   17    x    y
# 
# [[2]]
#            col1 col2 col3 col4
# row.name12    A   29    x    y
# row.name13    B   17    x    y
# row.name14    A   77    x    y

これにより、行が重複している可能性があるため便利なsのaが出力されます（この例のようにlist）。data.frame

更新2：`grep`より適切な場合に使用

これがあなたの質問のバリエーションwhich()です。...%in%アプローチを使用して解決するのはあまり便利ではありません。

set.seed(1)
dat1 <- data.frame(ID = 1:25, V1 = sample(100, 25, replace = TRUE))
rownames(dat1) <- paste("rowname", sample(apply(combn(LETTERS[1:4], 2), 
                                               2, paste, collapse = ""), 
                                         25, replace = TRUE), 
                       sprintf("%02d", 1:25), sep = ".")
head(dat1)
#               ID V1
# rowname.AD.01  1 27
# rowname.AB.02  2 38
# rowname.AD.03  3 58
# rowname.CD.04  4 91
# rowname.AD.05  5 21
# rowname.AD.06  6 90

ABここで、とで行を識別したいとしますACが、数値のサフィックスのリストがありません。

このようなシナリオで使用できる小さな関数を次に示します。@Spacedmanから少し借用して、返される行がデータの範囲内にあることを確認します（@flodelの提案による）。

getMyRows <- function(data, matches, range) {
  rowMatches = lapply(unlist(lapply(matches, function(x)
    grep(x, rownames(data)))), function(y) y + range)
  rowMatches = lapply(rowMatches, function(x) x[x > 0 & x <= nrow(data)])
  lapply(rowMatches, function(x) data[x, ])
}

次のように使用できます（ただし、ここでは結果を印刷しません）。最初にデータセットを指定し、次に一致させるパターンを指定し、次に範囲（この例では、前に3行、後に4行）を指定します。

getMyRows(dat1, c("AB", "AC"), -3:4)

前のマッチングrow.name12との例に適用するとrow.name13、次のように使用できますgetMyRows(dat, c(12, 13), -1:1)。

関数を変更してより一般的にすることもできます（たとえば、行名の代わりに列との一致を指定するため）。

score 3 · Accepted Answer

サンプルデータを作成します。

> dat=data.frame(col1=letters,col2=sample(26),col3=sample(letters))
> dat
   col1 col2 col3
1     a   26    x
2     b   12    i
3     c   15    v
...

ターゲットベクトルを設定し (エッジケースとオーバーラップケースを選択したことに注意してください)、一致する行を見つけます。

> target=c("a","e","g","s")
> match = which(dat$col1 %in% target)

-2 から +2 の一致のシーケンスを作成し (必要に応じて調整します)、マージします。

> getThese = unique(as.vector(mapply(seq,match-2,match+2)))
> getThese
 [1] -1  0  1  2  3  4  5  6  7  8  9 17 18 19 20 21

エッジケースを修正します。

> getThese = getThese[getThese > 0 & getThese <= nrow(dat)]
> dat[getThese,]
   col1 col2 col3
1     a   26    x
2     b   12    i
3     c   15    v
4     d   22    d
5     e    2    j
6     f    9    l
7     g    1    w
8     h   21    n
9     i   17    p
17    q   18    a
18    r   10    m
19    s   24    o
20    t   13    e
21    u    3    k
>

ターゲットが a、e、g、s だったことを思い出してください。これで、これらに加えて、それぞれに上に 2 行、下に 2 行があり、重複はありません。

行名を使用している場合は、それらから「一致」を作成するだけです。カラムを使用していました。

これが私の問題であれば、testthat パッケージを使用してさらに多くのテストを作成します。

score 0 · Accepted Answer

私は単に次のように進めます：

dat[(grep("row.name12",row.names(dat))-4):(grep("row.name13",row.names(dat))+4),]

grep("row.name12",row.names(dat))"row.name12"名前として持っている行番号を与えるので、

(grep("row.name12",row.names(dat))-4):(grep("row.name13",row.names(dat))+4)

"row.name12"名前の付いた行の前の4行目から、という名前の行の後の4行目までの一連の行番号を示します"row.name13"。

r - rデータフレームの特定の行の上下の行を返す

5 に答える 5

アップデート

更新2：grepより適切な場合に使用

Related

Reference

更新2：`grep`より適切な場合に使用