r - Stata の「注文」コマンドに相当する R 関数はありますか?

Question

R の「オーダー」は、Stata の「ソート」のようです。たとえば、データセットは次のとおりです (変数名のみがリストされています)。

v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18

ここに私が期待する出力があります：

v1 v2 v3 v4 v5 v7 v8 v9 v10 v11 v12 v17 v18 v13 v14 v15 v6 v16

R では、次の 2 つの方法があります。

data <- data[,c(1:5,7:12,17:18,13:15,6,16)]

また

names <- c("v1", "v2", "v3", "v4", "v5", "v7", "v8", "v9", "v10", "v11", "v12",  "v17", "v18", "v13", "v14", "v15", "v6", "v16")
data <- data[names]

Stata で同じ出力を得るには、次の 2 行を実行します。

order v17 v18, before(v13)
order v6 v16, last

上記の理想的なデータでは、処理したい変数の位置を知ることができます。しかし、ほとんどの実際のケースでは、「年齢」「性別」などの位置インジケーターのない変数があり、1 つのデータセットに 50 を超える変数がある場合があります。そうすれば、Stata での「順序」の利点がより明白になります。変数の正確な場所を知る必要はなく、名前を入力するだけです。

order age, after(gender)

この問題に対処するための R の基本関数はありますか、それともパッケージを入手できますか? 前もって感謝します。

tweetinfo <- data.frame(uid=1:50, mid=2:51, annotations=3:52, bmiddle_pic=4:53, created_at=5:54, favorited=6:55, geo=7:56, in_reply_to_screen_name=8:57, in_reply_to_status_id=9:58, in_reply_to_user_id=10:59, original_pic=11:60, reTweetId=12:61, reUserId=13:62, source=14:63, thumbnail_pic=15:64, truncated=16:65)
noretweetinfo <- data.frame(uid=21:50, mid=22:51, annotations=23:52, bmiddle_pic=24:53, created_at=25:54, favorited=26:55, geo=27:56, in_reply_to_screen_name=28:57, in_reply_to_status_id=29:58, in_reply_to_user_id=30:59, original_pic=31:60, reTweetId=32:61, reUserId=33:62, source=34:63, thumbnail_pic=35:64, truncated=36:65)
retweetinfo <- data.frame(uid=41:50, mid=42:51, annotations=43:52, bmiddle_pic=44:53, created_at=45:54, deleted=46:55, favorited=47:56, geo=48:57, in_reply_to_screen_name=49:58, in_reply_to_status_id=50:59, in_reply_to_user_id=51:60, original_pic=52:61, source=53:62, thumbnail_pic=54:63, truncated=55:64)
tweetinfo$type <- "ti"
noretweetinfo$type <- "nr"
retweetinfo$type <- "rt"
gtinfo <- rbind(tweetinfo, noretweetinfo)
gtinfo$deleted=""
gtinfo <- gtinfo[,c(1:16,18,17)]
retweetinfo <- transform(retweetinfo, reTweetId="", reUserId="")
retweetinfo <- retweetinfo[,c(1:5,7:12,17:18,13:15,6,16)]
gtinfo <- rbind(gtinfo, retweetinfo)
write.table(gtinfo, file="C:/gtinfo.txt", row.names=F, col.names=T, sep="\t", quote=F)
# rm(list=ls(all=T))

score 3 · Accepted Answer

私はさまざまなことを先延ばしにして実験しているので、ここに私が作り上げた機能があります。最終的に、それは以下に依存しappendます:

moveme <- function(invec, movecommand) {
  movecommand <- lapply(strsplit(strsplit(movecommand, ";")[[1]], ",|\\s+"), 
                        function(x) x[x != ""])
  movelist <- lapply(movecommand, function(x) {
    Where <- x[which(x %in% c("before", "after", "first", "last")):length(x)]
    ToMove <- setdiff(x, Where)
    list(ToMove, Where)
  })
  myVec <- invec
  for (i in seq_along(movelist)) {
    temp <- setdiff(myVec, movelist[[i]][[1]])
    A <- movelist[[i]][[2]][1]
    if (A %in% c("before", "after")) {
      ba <- movelist[[i]][[2]][2]
      if (A == "before") {
        after <- match(ba, temp)-1
      } else if (A == "after") {
        after <- match(ba, temp)
      }    
    } else if (A == "first") {
      after <- 0
    } else if (A == "last") {
      after <- length(myVec)
    }
    myVec <- append(temp, values = movelist[[i]][[1]], after = after)
  }
  myVec
}

データセットの名前を表すサンプルデータを次に示します。

x <- paste0("v", 1:18)

「v3」の前に「v17」と「v18」、最後に「v6」と「v16」、最初に「v5」が必要だったと想像してください。

moveme(x, "v17, v18 before v3; v6, v16 last; v5 first")
#  [1] "v5"  "v1"  "v2"  "v17" "v18" "v3"  "v4"  "v7"  "v8"  "v9"  "v10" "v11" "v12"
# [14] "v13" "v14" "v15" "v6"  "v16"

したがって、data.frame名前付きの「df」の明白な使用法は次のようになります。

df[moveme(names(df), "how you want to move the columns")]

そして、data.table名前付きの「DT」の場合（@mnelが指摘しているように、メモリ効率が高くなります）：

setcolorder(DT, moveme(names(DT), "how you want to move the columns"))

複合移動はセミコロンで指定されることに注意してください。

認識される移動は次のとおりです。

before(指定された列を別の名前付き列の前に移動します)
after(指定された列を別の名前付き列の後に移動します)
first(指定された列を最初の位置に移動します)
last(指定された列を最後の位置に移動します)

score 2 · Accepted Answer

私はあなたの問題を理解します。私は今提供するコードを持っています:

move <- function(data,variable,before) {
  m <- data[variable]
  r <- data[names(data)!=variable]
  i <- match(before,names(data))
  pre <- r[1:i-1]
  post <- r[i:length(names(r))]
  cbind(pre,m,post)
}

# Example.
library(MASS)
data(painters)
str(painters)

# Move 'Expression' variable before 'Drawing' variable.
new <- move(painters,"Expression","Drawing")
View(new)

score 2 · Accepted Answer

これを行う独自の関数を作成できます。

以下は、stata と同様の構文を使用して、列名の新しい順序を示します。

where4つの可能性を持つ名前付きリストです
- list(last = T)
- list(first = T)
- list(before = x)x問題の変数名はどこですか
- list(after = x)x問題の変数名はどこですか
sorted = Tvar_list辞書順でソートします (コマンドのalphabeticとコマンドの組み合わせ)sequentialstata

data.frameこの関数は名前のみで機能します (オブジェクトをとして渡すdataと、並べ替えられた名前のリストが返されます)。

例えば

stata.order <- function(var_list, where, sorted = F, data) {
    all_names = names(data)
    # are all the variable names in
    check <- var_list %in% all_names
    if (any(!check)) {
        stop("Not all variables in var_list exist within  data")
    }
    if (names(where) == "before") {
        if (!(where %in% all_names)) {
            stop("before variable not in the data set")
        }
    }
    if (names(where) == "after") {
        if (!(where %in% all_names)) {
            stop("after variable not in the data set")
        }
    }

    if (sorted) {
        var_list <- sort(var_list)
    }
    where_in <- which(all_names %in% var_list)
    full_list <- seq_along(data)
    others <- full_list[-c(where_in)]

    .nwhere <- names(where)
    if (!(.nwhere %in% c("last", "first", "before", "after"))) {
        stop("where must be a list of a named element first, last, before or after")
    }

    do_what <- switch(names(where), last = length(others), first = 0, before = which(all_names[others] == 
        where) - 1, after = which(all_names[others] == where))

    new_order <- append(others, where_in, do_what)
    return(all_names[new_order])
}

tmp <- as.data.frame(matrix(1:100, ncol = 10))

stata.order(var_list = c("V2", "V5"), where = list(last = T), data = tmp)

##  [1] "V1"  "V3"  "V4"  "V6"  "V7"  "V8"  "V9"  "V10" "V2"  "V5" 

stata.order(var_list = c("V2", "V5"), where = list(first = T), data = tmp)

##  [1] "V2"  "V5"  "V1"  "V3"  "V4"  "V6"  "V7"  "V8"  "V9"  "V10"

stata.order(var_list = c("V2", "V5"), where = list(before = "V6"), data = tmp)

##  [1] "V1"  "V3"  "V4"  "V2"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10"

stata.order(var_list = c("V2", "V5"), where = list(after = "V4"), data = tmp)

##  [1] "V1"  "V3"  "V4"  "V2"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10"

# throws an error
stata.order(var_list = c("V2", "V5"), where = list(before = "v11"), data = tmp)

## Error: before variable not in the data set

メモリ効率的に並べ替えを行いたい場合は（参照によって、コピーせずに）使用しますdata.table

DT <- data.table(tmp)
# sets by reference, no copying
setcolorder(DT, stata.order(var_list = c("V2", "V5"), where = list(after = "V4"), 
    data = DT))

DT

##     V1 V3 V4 V2 V5 V6 V7 V8 V9 V10
##  1:  1 21 31 11 41 51 61 71 81  91
##  2:  2 22 32 12 42 52 62 72 82  92
##  3:  3 23 33 13 43 53 63 73 83  93
##  4:  4 24 34 14 44 54 64 74 84  94
##  5:  5 25 35 15 45 55 65 75 85  95
##  6:  6 26 36 16 46 56 66 76 86  96
##  7:  7 27 37 17 47 57 67 77 87  97
##  8:  8 28 38 18 48 58 68 78 88  98
##  9:  9 29 39 19 49 59 69 79 89  99
## 10: 10 30 40 20 50 60 70 80 90 100

score 0 · Accepted Answer

あなたが何をしたいのかは非常に不明確ですが、最初の文では、データセットを並べ替えたいと思います。

実際にはorder、順序付けられたシーケンスのインデックスを返す組み込み関数があります。これを探していますか？

> x <- c(3,2,1)

> order(x)
[1] 3 2 1

> x[order(x)]
[1] 1 2 3

score 0 · Accepted Answer

これにより、同じファイルが得られるはずです。

#snip
gtinfo <- rbind(tweetinfo, noretweetinfo)
gtinfo$deleted=""
retweetinfo <- transform(retweetinfo, reTweetId="", reUserId="")
gtinfo <- rbind(gtinfo, retweetinfo)
gtinfo <-gtinfo[,c(1:16,18,17)]
#snip

Strata の order 関数のような関数を R で実装することは可能ですが、あまり需要がないと思います。

r - Stata の「注文」コマンドに相当する R 関数はありますか?

6 に答える 6

Related

Reference