r - その後の観測（国年）間の値の差を取得するにはどうすればよいですか？

Question

たとえば、10年間で5か国のスコアが次のようになっているとします。

mydata<-1:3
mydata<-expand.grid(
country=c('A', 'B', 'C', 'D', 'E'),
year=c('1980','1981','1982','1983','1984','1985','1986','1987','1988','1989'))
mydata$score=sapply(runif(50,0,2), function(x) {round(x,4)})

library(reshape)
mydata<-reshape(mydata, v.names="score", idvar="year", timevar="country", direction="wide")

> head(mydata)
   year score.A score.B score.C score.D score.E
1  1980  1.0538  1.6921  1.3165  1.7434  1.9687
6  1981  1.4773  1.6479  0.3135  0.6172  0.7704
11 1982  0.8748  1.3704  0.2788  1.6306  1.7237
16 1983  1.1224  1.1340  1.7684  1.3352  0.4317
21 1984  1.5496  1.8706  1.4641  0.5313  0.8590
26 1985  1.7715  1.8953  0.6230  0.3580  1.6313

ここで、次の年のスコアが前年のスコアと+/- 0.5異なる場合は1であり、これが当てはまらない場合は0である新しい変数「期間」を作成したいと思います。5カ国すべてでそうしたいと思います。そして、期間= 1の国年を特定し、この情報を表に表示できれば素晴らしいと思います。

> head(mydata)
   year score.A score.B score.C score.D score.E  period.A  period.B ...
1  1980  1.0538  1.6921  1.3165  1.7434  1.9687   NA         NA
6  1981  1.4773  1.6479  0.3135  0.6172  0.7704   0          ....
11 1982  0.8748  1.3704  0.2788  1.6306  1.7237   1
16 1983  1.1224  1.1340  1.7684  1.3352  0.4317   0
21 1984  1.5496  1.8706  1.4641  0.5313  0.8590   0
26 1985  1.7715  1.8953  0.6230  0.3580  1.6313   0

これが多すぎないことを願っています。で試してみdistましたlibrary(proxy)が、関数を行全体ではなく観測のペアに制限する方法がわかりません。どうもありがとう！！

score 3 · Accepted Answer

これはdiffとを使用しlapplyます:

score.cols  <- grep("score", colnames(mydata), value=TRUE)
period.cols <- gsub("score", "period", score.cols)
compute.period <- function(x)as.integer(c(NA, abs(diff(x)) > 0.5))
cbind(mydata, `names<-`(lapply(mydata[score.cols], compute.period), period.cols))

編集：おそらく正しいデータ構造で作業していないことが（今朝投稿された他の質問で）より明らかになります。代わりに、生の (再形成される前の) データで作業を行うことをお勧めします。

period.fun <- function(x)as.integer(c(NA, abs(diff(x) > 0.5)))
mydata <- within(mydata, period <- ave(score, country, FUN = period.fun))

mydataそうして初めて、最終的な形にするために形を変えます。

score 3 · Accepted Answer

まず、set.seed()再現可能にするために使用してデータを作成します。

set.seed(1014)
mydata <- expand.grid(
  country = c('A', 'B', 'C', 'D', 'E'),
  year = 1980:1989
)
mydata$score <- round(runif(50, 0, 2), 4)
head(mydata)

#>   country year  score
#> 1       A 1980 0.1615
#> 2       B 1980 1.6687
#> 3       C 1980 1.2015
#> 4       D 1980 0.3144
#> 5       E 1980 0.0148
#> 6       A 1981 0.9328

次に、dplyr を使用して国に分割し、各値を前の値と比較します。

library(dplyr)
out <- mydata %.% 
  group_by(country) %.%
  mutate(big_diff = abs(score - lag(score)) > 0.5)

out %.% 
  arrange(country, year) %.% 
  head(10)

#> Source: local data frame [10 x 4]
#> Groups: country
#> 
#>    country year  score big_diff
#> 1        A 1980 0.1615       NA
#> 2        A 1981 0.9328     TRUE
#> 3        A 1982 1.7492     TRUE
#> 4        A 1983 0.3913     TRUE
#> 5        A 1984 0.5798    FALSE
#> 6        A 1985 1.4830     TRUE
#> 7        A 1986 0.0625     TRUE
#> 8        A 1987 0.8643     TRUE
#> 9        A 1988 1.3603    FALSE
#> 10       A 1989 1.5312    FALSE

この後big_diff()、数値に強制し、 reshape を使用して列に移動countryできますが、将来の手順で作業するのが難しくなるため、おそらくそうしません。詳細については、きちんとしたデータを参照してください。

score 2 · Accepted Answer

library(stringr)
periods <- function(mydata) {
# pull out columns with score in the title
score_columns <- mydata[, str_detect(names(mydata), "score")]
# make a copy to store the periods
period_columns <- score_columns
# rename the columns in periods
names(period_columns) <- str_replace_all(names(period_columns), "score", "periods")

for ( i in 1:length(score_columns))
    { 
        offset <- c(NA,score_columns[2:length(score_columns[,i])-1,i])
            # if the diff is > 0.5, return 1 else return 0.
        period_columns[, i] <- ifelse(offset - score_columns[,i]>0.5, 1, 0)
    }

 return(cbind(data,period_columns))
}

# Then simply call the function on your data. It should work with variable number
# of score columns.

> periods(mydata)
   year score.A score.B score.C score.D score.E periods.A
1  1980  1.8251  1.3168  0.9264  1.4921  0.9870        NA
6  1981  0.7603  1.7270  0.0324  1.8332  0.7147         1
11 1982  1.5245  0.6904  1.1699  0.5918  0.3029         0
16 1983  0.5280  0.2333  1.4395  1.2145  0.7273         1
21 1984  1.8739  1.8420  0.9940  0.2886  1.5975         0
26 1985  1.8794  0.7352  1.1665  0.9859  1.1301         0
31 1986  1.8002  0.3546  0.3885  1.9985  1.7183         0
36 1987  1.7985  1.0536  1.8445  0.8573  1.9307         0
41 1988  1.8444  0.6644  1.4765  0.2586  0.5531         0
46 1989  0.7342  0.4921  0.5816  0.8954  0.9359         1
   periods.B periods.C periods.D periods.E
1         NA        NA        NA        NA
6          0         1         0         0
11         1         0         1         0
16         0         0         0         0
21         0         0         1         0
26         1         0         0         0
31         0         1         0         0
36         0         0         1         0
41         0         0         1         1
46         0         1         0         0

score 1 · Accepted Answer

次のように1行で実行できますdplyr。

library(dplyr)
df2<-mydata%.%group_by(country)%.%mutate(period = c(NA, as.numeric(abs(diff(score))>0.5)))

次に、それを再形成できますdcast

library(reshape2)
dcast(df2,year~country)

結果：

   year  A  B  C  D  E
1  1980 NA NA NA NA NA
2  1981  0  1  0  0  1
3  1982  1  0  1  1  0
4  1983  1  0  1  0  1
5  1984  1  1  0  1  1
6  1985  1  0  1  0  0
7  1986  1  1  1  0  1
8  1987  1  1  0  0  0
9  1988  0  0  1  1  1
10 1989  0  0  0  0  0

r - その後の観測（国年）間の値の差を取得するにはどうすればよいですか？

4 に答える 4

Related

Reference