r - 数値変数と2因子変数の要約統計量（SASのこれらのコマンドはRで何になりますか？）

Question

私はRを初めて使用します-SASの使用に慣れています。3つの変数が、、、、である多くの変数を含むデータセットがageありsexますagegroup。と変数の変数ageの要約統計量（平均、中央値、Q1-Q3、sd）を生成しようとしています。つまり、1の女性（）、次に2などの年齢の要約統計量であり、男性（性別= 1）の場合も同じです。sexagegroupsex=0agegroupagegroup

SASでは、次を使用します。

proc univariate data=mydata;  
var age;  
class agegroup;  
class sex;  
run;

これはRで何になりますか？

npar1wayまた、 RのSASとは何ですか？例えば

proc npar1way data=mydata;  
where minutes ne 9;  
var minutes;  
class sex;  
run;`

ここで、9は値が欠落しているため、分は9に等しくありません。Rでこれを行うにはどうすればよいですか？

score 2 · Accepted Answer

# In R, missing values are denoted by "NA" instead of the number 9.

# save this data in a text file 
age agegroup sex
1 agegroup1 male
2 agegroup2 female
3 agegroup3 male
5 agegroup1 female
7 agegroup2 male
8 agegroup3 female
1 agegroup3 male
2 agegroup2 female
3 agegroup1 male

# Set the working directory to the location of the data file using the function 
setwd("PATH OF THE DIRECTORY")

data <- read.table("data", header=TRUE, sep=" ")
data
data$sex <- factor(data$sex, levels = c('male', 'female'), ordered=TRUE)
data$agegroup <- factor(data$agegroup, levels = c('agegroup1', 'agegroup2', 'agegroup3'), ordered=TRUE)

# Know the structure of your data
str(data)

# Summary of the data
summary(data)

# Std. Dev. of the variable "age"
std.dev.age <- sd(data$age)
std.dev.age

# Summary of three variables in a table form
table(data)

# Plot a dodged bar chart with age ~ sex + agegroup
library("ggplot2")

ggplot(data = data, aes(x = sex, y = age, ymin=0, ymax=8, fill = agegroup)) + geom_bar(position="dodge", stat="identity", width=0.50) + scale_fill_manual(values=c("red", "green", "blue")) + labs (x = "", y= "age(years)",  fill=" ")

score 2 · Accepted Answer

aggregate関数 inを使用しRて、データをサブセットに分割し、各サブセットの要約統計を計算し、結果を便利な形式で返すことができます。

> age <- runif(100, 20, 60)
> sex <- sample(c(0, 1), 100, replace = T)
> agegroup <- sample(1:3, 100, replace = T)
# create some data

sex次に、およびでグループ化さagegroupれたサブセットの分位数を計算できます。

> aggregate(x=age, by=list(sex=sex, agegroup=agegroup), FUN="quantile")
  sex agegroup     x.0%    x.25%    x.50%    x.75%   x.100%
1   0        1 26.70523 31.75807 37.09244 46.49449 59.77582
2   1        1 20.68903 34.49182 45.66960 48.69480 54.90620
3   0        2 20.22123 33.22948 40.57074 47.32490 58.85273
4   1        2 23.50579 31.38165 35.69254 45.13376 50.68572
5   0        3 23.46469 29.72909 42.53047 46.93867 58.30279
6   1        3 20.64256 27.22600 39.70127 48.66251 59.61565

または平均を計算する

> aggregate(x=age, by=list(sex=sex, agegroup=agegroup), FUN="mean")
  sex agegroup        x
1   0        1 39.95470
2   1        1 41.53341
3   0        2 40.53606
4   1        2 37.32189
5   0        3 40.68784
6   1        3 38.74829

標準偏差、分散、またはサブセットごとに計算するその他の統計についても同様です。

score 1 · Accepted Answer

# make some test data
age <- runif(100, 20, 60)
sex <- sample(c(0, 1), 100, replace = T)
agegroup <- sample(1:3, 100, replace = T)
test <- data.frame(age,sex,agegroup)

# define a new summary function to include the SD as well
# otherwise you will just get mean,median,min,max,Q1-Q3.
newsummary <- function(x) {c(summary(x),SD=sd(x))}

# get the summary stats by each agegroup/sex combo
by(test$age,test[c("sex","agegroup")],newsummary)

結果は次のようになり、リスト形式で出力されます。

> by(test$age,test[c("sex","agegroup")],newsummary)
sex: 0
agegroup: 1
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.       SD 
22.07000 27.72000 38.36000 38.41000 48.02000 54.93000 11.50681 
------------------------------------------------------------ 
sex: 1
agegroup: 1
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.       SD 
24.36000 38.20000 44.96000 44.55000 52.95000 58.03000 10.70105 
------------------------------------------------------------ 
sex: 0
agegroup: 2
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.       SD 
21.52000 28.54000 36.75000 38.52000 49.45000 57.12000 12.26674 
------------------------------------------------------------ 
sex: 1
agegroup: 2
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.      SD 
20.0900 26.9900 31.7700 35.9800 44.6200 57.3500 11.9548 
------------------------------------------------------------ 
sex: 0
agegroup: 3
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.      SD 
20.5100 30.4300 39.6300 39.4100 47.4100 57.6000 11.9816 
------------------------------------------------------------ 
sex: 1
agegroup: 3
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.       SD 
20.04000 25.01000 36.03000 37.58000 47.81000 59.65000 13.14822

r - 数値変数と2因子変数の要約統計量（SASのこれらのコマンドはRで何になりますか？）

3 に答える 3

Related

Reference