2

I am new to R and am trying to work on a data frame from a csv file (as seen from the code below). It has hospital data with 46 columns and 4706 rows (one of those columns being 'State'). I made a table showing counts of rows for each value in the State column. So in essence the table shows each state and the number of hospitals in that state. Now what I want to do is subset the data frame and create a new one without the entries for which the state has less than 20 hospitals.

How do I count the occurrences of values in the State column and then remove those that count up to less than 20? Maybe I am supposed to use the table() function, remove the undesired data and put that into a new data frame using something like lappy(), but I'm not sure due to my lack of experience in programming with R.

Any help will be much appreciated. I have seen other examples of removing rows that have certain column values in this site, but not one that does that based on the count of a particular column value.

> outcome <- read.csv("outcome-of-care-measures.csv", colClasses = "character")    
> hospital_nos <- table(outcome$State)    
> hospital_nos

 AK  AL  AR  AZ  CA  CO  CT  DC  DE  FL  GA  GU  HI  IA  ID  IL  IN  KS  KY  LA  MA  MD  ME  MI 
 17  98  77  77 341  72  32   8   6 180 132   1  19 109  30 179 124 118  96 114  68  45  37 134 
 MN  MO  MS  MT  NC  ND  NE  NH  NJ  NM  NV  NY  OH  OK  OR  PA  PR  RI  SC  SD  TN  TX  UT  VA 
133 108  83  54 112  36  90  26  65  40  28 185 170 126  59 175  51  12  63  48 116 370  42  87 
 VI  VT  WA  WI  WV  WY 
  2  15  88 125  54  29 
4

1 に答える 1

8

ここにそれを行う1つの方法があります。次のデータ フレームから開始します。

df <- data.frame(x=c(1:10), y=c("a","a","a","b","b","b","c","d","d","e"))

で 2 回以上出現する行のみを保持する場合はdf$y、次のようにします。

tab <- table(df$y)
df[df$y %in% names(tab)[tab>2],]

与える:

  x y
1 1 a
2 2 a
3 3 a
4 4 b
5 5 b
6 6 b

そして、これはplyrパッケージを使用した1行のソリューションです:

ddply(df, "y", function(d) {if(nrow(d)>2) d else NULL})
于 2013-10-16T19:51:37.283 に答える