r - サブ設定パネルデータ

Question

非常に新しいので、これがあまりにも多くを求めているかどうか私に知らせてください。パネルデータをRで2つの異なるカテゴリにサブセット化しようとしています。1つは変数の完全な情報を持ち、もう1つは変数の不完全な情報を持っています。私のデータは次のようになります。

Person     Year Income Age Sex
    1      2003  1500   15  1
    1      2004  1700   16  1
    1      2005  2000   17  1
    2      2003  1400   25  0
    2      2004  1900   26  0
    2      2005  2000   27  0

私がする必要があるのは、各列（列1と2ではない）を調べることです。データが変数のデータでいっぱいの場合（変数は最初の列のIDで定義され、次に列名で定義されます。上の図の例は次のとおりです。 person1Income）それをデータセットに返します。それ以外の場合は、別のデータセットに入れます。これが私のメタコードと、上記のデータを前提として実行する必要のある例です。注：変数をID名、次に列名で呼び出します。たとえば、変数person1Incomeは、列3の最初の3行になります。

for(each variable in all columns except 1 and 2 in data set) if (variable = FULL) { return to data set "completes" }
else {put in data set "incompletes"}
completes = person1Income, person2Income, person1Age, person2Age, person1Sex, person2 sex
incompletes = {empty because the above info is full}

誰かがこの質問に完全に答えることができないかどうかはわかりますが、助けていただければ幸いです。また、私の目標が明確でない場合は、私に知らせてください。私は明確にするよう努めます。

tl;drまだ一文で説明できないので...ごめんなさい。

編集：完全な変数と不完全な変数の意味を視覚化します。スクリーンショット

score 0 · Accepted Answer

これが名前=='dfrm'のdata.frameにあると仮定しましょう

completes <- dfrm[ complete.cases(dfrm[-(1:2)]) ,]
incompletes <- dfrm[ !complete.cases(dfrm[-(1:2)]) ,]

私の行方不明の親に気づいてくれた@WojciechSobalaに感謝します。欠落している値が1つの列にあるかどうかを特定するために、リストを作成できます。IDのリストは単純です。どの列に欠落値があるかを識別することもかなり簡単ですが、それらはすべてNAであるため、「id変数に対応するその列の値」が何を意味するのかわかりません。識別手順には、次のものを使用できます。

apply(incompletes, 1, function(x) c(x[1], x[2], which(is.na(x[-(1:2)]))))

あなたが何を求めているのかわかりました。まだ解決策はありませんが、2つの列の値を相互分類することによって形成されるカテゴリを列挙して操作する場合に役立つ可能性のあるR関数をいくつか紹介します。

dat <- structure(list(Person = c(1L, 1L, 1L, 2L, 2L, 2L), Year = c(2003L, 
2004L, 2005L, 2003L, 2004L, 2005L), Income = c(1500L, NA, 2000L, 
1400L, 1900L, 2000L), Age = c(15L, 16L, 17L, 25L, 26L, 27L), 
    Sex = c(1L, 1L, 1L, 0L, 0L, 0L)), .Names = c("Person", "Year", 
"Income", "Age", "Sex"), row.names = c(NA, -6L), class = "data.frame")

completes <-  lapply( split(dat[ , 3:5], dat$Person), function(x)  sapply(x, function(y) { if( all( !is.na(y)) ) { y } else { NA} })  )

$`1`
$`1`$Income
[1] NA

$`1`$Age
[1] 15 16 17

$`1`$Sex
[1] 1 1 1


$`2`
     Income Age Sex
[1,]   1400  25   0
[2,]   1900  26   0
[3,]   2000  27   0

 incompletes <- lapply( split(dat[ , 3:5], dat$Person), function(x)  sapply(x, function(y) { if( !all( !is.na(y)) ) { y } else { NA} }) )

$`1`
$`1`$Income
[1] 1500   NA 2000

$`1`$Age
[1] NA

$`1`$Sex
[1] NA


$`2`
Income    Age    Sex 
    NA     NA     NA

score 0 · Accepted Answer

あなたの写真を使用して、ここにあなたが望むものを突き刺します。それは長蛇の列であり、他の人はそれを行うためのよりエレガントな方法を持っているかもしれませんが、それは仕事を成し遂げます：

library("reshape2")

con <- textConnection("Person Year Income Age Sex
  1      2003  1500   15  1
  1      2004  1700   16  1
  1      2005  2000   17  1
  2      2003  1400   25  0
  2      2004  1900   NA  0
  2      2005  2000   27  0
  3      2003  NA   25  0
  3      2004  1900   NA  0
  3      2005  2000   27  0")
pnls <- read.table(con, header=TRUE)

# reformat table for easier processing
pnls2 <- melt(pnls, id=c("Person"))
# and select those rows that relate to values
# of income and age
pnls2 <- subset(pnls2,
              variable == "Income" | variable == "Age")

# create column of names in desired format (e.g Person1Age etc)
pnls2$name <- paste("Person", pnls2$Person, pnls2$variable, sep="")

# Collect full set of unique names
name.set <- unique(pnls2$name)
# find the incomplete set
incomplete <- unique( pnls2$name[ is.na(pnls2$value) ]) 
# then find the complement of the incomplete set
complete <- setdiff(name.set, incomplete) 

# These two now contain list of complete and incomplete variables
complete
incomplete

meltingとreshape2パッケージに慣れていない場合は、行ごとに実行pnls2し、さまざまな段階での値を調べて、これがどのように機能するかを確認することをお勧めします。

編集：@bstocktonの要求に応じて値をコンパイルするコードを追加します。これを行うには、はるかに適切なRイディオムがあると確信していますが、これも、より良い答えがない場合に機能します。

# use these lists of complete and incomplete variable names
# as keys to collect lists of values for each variable name
compile <- function(keys) {
    holder = list()
    for (n in keys) {
        holder[[ n ]] <- subset(pnls2, pnls2$name == n)[,3]
    }
    return( as.data.frame(holder) )
}

complete.recs <- compile(complete)
incomplete.recs <- compile(incomplete)

r - サブ設定パネルデータ

2 に答える 2

Related

Reference