r - データセットの残りの変数

Question

100 個をサンプリングした 150 個の数値のデータセットがあります。残りの 50 個を特定する (新しいマトリックスに入れる) にはどうすればよいですか?

X <- runif(150)
Combined <- sample(X, 100)

score 2 · Accepted Answer

サンプルを別のベクターとして作成します。

using <- sample(1:150, 100)

Entires <- All.Entries[using]
Non.Entries <- All.Entries[-using]

score 0 · Accepted Answer

すべての数字：

x <- sample(10, 150, TRUE) # as an example

ランダムサンプル：

Combined <- sample(x,100)

残りの数：

xs <- sort(x) # sort the values of x
tab <- table(match(Combined, xs))
Remaining <- xs[-unlist(mapply(function(x, y) seq(y, length = x),
                               tab, as.numeric(names(tab))))]

ノート。このソリューションは、x値が重複している場合にも機能します。

score 0 · Accepted Answer

あなたのコメントに基づいて更新します。

Combinedがのサブセットである場合、含まれていないXの要素を見つけるには、次を使用できます。XCombined

    X[ !(X %in% Combined) ]

X %in% Combined)要素が存在する場合と要素が存在しない場合Xの値と同じサイズの論理ベクトルを提供します。TRUECombinedFALSE

コースの説明として: この論理ベクトルは指標として使用できます。 X[ X %in% Combined ]にあるものすべてを提供XしXますCombined。

反対を求めているので、論理ベクトルを否定して、にないものX[ !(X %in% Combined) ]をすべて取得します。XXCombined

IFXに重複が含まれている場合は、名前に基づいてフィルタリングできます (もちろん、一意の名前を想定しています)。

X[ !(names(X) %in% names(Combined)) ] 

# or if sampling by rows
X[ !(rownames(X) %in% rownames(Combined)) ]

簡単に名前を付けることができますX

names(X) <- 1:length(X)

# or for multi-dimensional
rownames(X)  <- 1:nrow(X)

のヘルプドキュメントも参照してください。

?"%in%"  # note the quotes
?which
?match

または、代わりにインデックスをサンプリングすることもできます。次のmat[-indices,] 例のようにマイナス記号を使用します。

    # Create a sample matrix of 150 rows, 3 columns
    mat <- matrix(rnorm(450), ncol=3)

    # Take a sampling of indices to the rows
    indices <- sample(nrow(mat), 100, replace=F)

    # Splice the matrix
    mat.included <- mat[indices,]
    mat.leftover <- mat[-indices,]

    # Confirm everything is of proper size
    dim(mat)
    # [1] 150   3
    dim(mat.included)
    # [1] 100   3
    dim(mat.leftover)
    # [1] 50  3

r - データセットの残りの変数

3 に答える 3

あなたのコメントに基づいて更新します。

Related

Reference