r - Find difference of categorical data in r

Question

I am new to this website and to R language and this is my first question here :)

I am analyzing a data set of US people's salary in different years and different states (30 in total labeled as 1, 2, ..., 30). The starting year are all the same (1970) but the ending years vary (from 1990 to 2000). For each state, I wish to find the difference between the salary in the ending year and that in the starting year. I wrote the following but it does not work:

for (i in 1:30) {
  salarygrowth <- function(salary[state == "i", time == max(1990:2000, na.rm=FALSE)], salary[state == "i", time == 1970]) { 
  salary[state == "i", time == max(1990:2000, na.rm=FALSE)] - salary[state == "i", time == 1970]}
}

How could I fix and improve it so that I could the desired salary growth for each state with the year provided. Thanks so much in advance!

As required, the following is some data:

  time      state       salary
  1970        1         27890
  1971        1         28800
  1972        1         31257
  1973        1         32846
              ...
  1995        1         58934
  1970        2         26783
  1971        2         28987
              ...
  1997        2         67998
  1970        3         21349
              ...
  1992        3         56212
              ...
  2000        30        67876

score 2 · Accepted Answer

これは、グループ関数による集計を使用して行うことができます。1 つのオプションはdplyr. 「状態」でグループ化し、最大の「時間」に対応する「給与」と最小の「時間」の差を取得します

library(dplyr)
df1 %>%
  group_by(state) %>%
  summarise(salary = salary[which.max(time)]- salary[which.min(time)])

「時間」列が順序付けされていない場合の別のオプションとして、で順序付けしてからarrangeを使用してfirst、last給与の最初と最後の値を抽出し、差をとります。

df1 %>%
   group_by(state) %>%
   arrange(time) %>%
   summarise(salary=last(salary)- first(salary))

またはを使用してdata.table、「data.frame」を「data.table」( setDT(df1)) に変換し、「状態」、order「時間」でグループ化して、最後 ( .N) と最初 ( 1L) の「給与」の差を取得します。

library(data.table)
setDT(df1)[order(time), list(salary=salary[.N]- salary[1L]), by = state]

または、「時間」と「状態」が順序付けられている場合はduplicated、「状態」列を使用して論理インデックスを取得し、「給与」を抽出して差を取得することもできます。

 salary <- with(df1, salary[!duplicated(state, fromLast=TRUE)]-
                     salary[!duplicated(state)])
 data.frame(state=unique(df1$state), salary)

r - Find difference of categorical data in r

1 に答える 1

Related

Reference