r - R dplyr は複数の関数を選択した変数に要約します

Question

平均で要約したいデータセットがありますが、変数の最大値も 1 つだけ計算します。

私が達成したいことの例から始めましょう：

iris %>%
  group_by(Species) %>%
  filter(Sepal.Length > 5) %>%
  summarise_at("Sepal.Length:Petal.Width",funs(mean))

次の結果が得られます

# A tibble: 3 × 5
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
      <fctr>        <dbl>       <dbl>        <dbl>       <dbl>
1     setosa          5.8         4.4          1.9         0.5
2 versicolor          7.0         3.4          5.1         1.8
3  virginica          7.9         3.8          6.9         2.5

たとえば、max(Petal.Width)要約するために追加する簡単な方法はありますか?

これまでのところ、次のことを試しました。

iris %>%
  group_by(Species) %>%
  filter(Sepal.Length > 5) %>%
  summarise_at("Sepal.Length:Petal.Width",funs(mean)) %>%
  mutate(Max.Petal.Width = max(iris$Petal.Width))

しかし、このアプローチでは、上記のコードのgroup_byとの両方が失われ、間違った結果が得られます。filter

私が達成できた唯一の解決策は次のとおりです。

iris %>%
  group_by(Species) %>%
  filter(Sepal.Length > 5) %>%
  summarise_at("Sepal.Length:Petal.Width",funs(mean,max)) %>%
  select(Species:Petal.Width_mean,Petal.Width_max) %>% 
  rename(Max.Petal.Width = Petal.Width_max) %>%
  rename_(.dots = setNames(names(.), gsub("_.*$","",names(.))))

これは少し複雑で、別の要約で列を追加するだけで多くの入力が必要です。

ありがとうございました

score 1 · Accepted Answer

私は似たようなものを探していて、次のことを試しました。提案されたソリューションよりもうまく機能し、読みやすくなっています。

iris %>% 
group_by(Species) %>%
filter(Sepal.Length > 5) %>% 
summarise(MeanSepalLength=mean(Sepal.Length), 
MeanSepalWidth = mean(Sepal.Width),
MeanPetalLength=mean(Petal.Length),
MeanPetalWidth=mean(Petal.Width), 
MaxPetalWidth=max(Petal.Width))

# A tibble: 3 x 6
Species    MeanSepalLength MeanSepalWidth MeanPetalLength MeanPetalWidth MaxPetalWidth
<fct>                <dbl>          <dbl>           <dbl>          <dbl>         <dbl>
1 setosa                5.01           3.43            1.46          0.246           0.6
2 versicolor            5.94           2.77            4.26          1.33            1.8
3 virginica             6.59           2.97            5.55          2.03            2.5

summarise() 部分で、列名を定義し、選択した関数内で要約する列を指定します。

score 1 · Accepted Answer

すべてを dplyr で実行しようとしている場合 (覚えやすいかもしれません)、dplyr 1.0.0acrossから利用可能になる新しい関数を活用できます。

iris %>%
  group_by(Species) %>%
  filter(Sepal.Length > 5) %>% 
  summarize(across(Sepal.Length:Petal.Width, mean)) %>% 
  cbind(iris %>% 
          group_by(Species) %>% 
          summarize(across(Petal.Width, max)) %>% 
          select(-Species)
  )

唯一の問題は、グループ化された変数の同じ列Petal.Widthで 2 つの計算を結合することです。グループ化を再度行う必要がありますが、cbind. これは正しく結果を返します:

     Species Sepal.Length Sepal.Width Petal.Length Petal.Width Petal.Width
1     setosa     5.313636    3.713636     1.509091   0.2772727         0.6
2 versicolor     5.997872    2.804255     4.317021   1.3468085         1.8
3  virginica     6.622449    2.983673     5.573469   2.0326531         2.5

タスクが 2 つの計算を指定せず、同じ column で 1 つだけを指定する場合、Petal.Widthこれは次のようにエレガントに記述できます。

iris %>%
  group_by(Species) %>%
  filter(Sepal.Length > 5) %>% 
  summarize(
    across(Sepal.Length:Petal.Length, mean),
    across(Petal.Width, max)
  )

r - R dplyr は複数の関数を選択した変数に要約します

4 に答える 4

Related

Reference