ベースRの関数を確認する必要がありますcut()
。さらに冒険する前に、回答の最後の行(太字)にも注意してください。
> set.seed(42)
> cut(runif(50), 6)
[1] (0.825,0.99] (0.825,0.99] (0.167,0.332] (0.825,0.99]
[5] (0.496,0.661] (0.496,0.661] (0.661,0.825] (0.00296,0.167]
[9] (0.496,0.661] (0.661,0.825] (0.332,0.496] (0.661,0.825]
[13] (0.825,0.99] (0.167,0.332] (0.332,0.496] (0.825,0.99]
[17] (0.825,0.99] (0.00296,0.167] (0.332,0.496] (0.496,0.661]
[21] (0.825,0.99] (0.00296,0.167] (0.825,0.99] (0.825,0.99]
[25] (0.00296,0.167] (0.496,0.661] (0.332,0.496] (0.825,0.99]
[29] (0.332,0.496] (0.825,0.99] (0.661,0.825] (0.661,0.825]
[33] (0.332,0.496] (0.661,0.825] (0.00296,0.167] (0.825,0.99]
[37] (0.00296,0.167] (0.167,0.332] (0.825,0.99] (0.496,0.661]
[41] (0.332,0.496] (0.332,0.496] (0.00296,0.167] (0.825,0.99]
[45] (0.332,0.496] (0.825,0.99] (0.825,0.99] (0.496,0.661]
[49] (0.825,0.99] (0.496,0.661]
6 Levels: (0.00296,0.167] (0.167,0.332] (0.332,0.496] ... (0.825,0.99]
cut()
は、観測データが該当する 6 つのグループ (この場合は 6 つのグループ) のどれにインデックスを付ける係数を返します。これは、データの範囲を等間隔の 6 つのグループに単純に分割したものです。?cut
間隔の端で何をすべきかについての詳細を読んでください。
コードが失敗する理由は、によって返されるオブジェクトがhist()
、グループに分割されたデータよりもはるかに多くを含むリストであるためです。
> foo <- hist(runif(50), breaks = 6, plot = FALSE)
> str(foo)
List of 7
$ breaks : num [1:6] 0 0.2 0.4 0.6 0.8 1
$ counts : int [1:5] 12 13 7 13 5
$ intensities: num [1:5] 1.2 1.3 0.7 1.3 0.5
$ density : num [1:5] 1.2 1.3 0.7 1.3 0.5
$ mids : num [1:5] 0.1 0.3 0.5 0.7 0.9
$ xname : chr "runif(50)"
$ equidist : logi TRUE
- attr(*, "class")= chr "histogram"
したがって、これを因数に変換することはできません-Rはそれを行う方法を知りません。hist()
また、 6 つのグループに分類されたデータは返されないことに注意してください。これは、ヒストグラムの作成に役立つ他の情報を提供します。とは異なり、かなりのブレークが発生することにも注意してくださいcut()
。これらのかなりの休憩が必要な場合は、次の方法で再現できますhist()
。
> set.seed(42)
> x <- runif(50)
> brks <- pretty(range(x), n = 6, min.n = 1)
> cut(x, breaks = brks)
[1] (0.8,1] (0.8,1] (0.2,0.4] (0.8,1] (0.6,0.8] (0.4,0.6] (0.6,0.8]
[8] (0,0.2] (0.6,0.8] (0.6,0.8] (0.4,0.6] (0.6,0.8] (0.8,1] (0.2,0.4]
[15] (0.4,0.6] (0.8,1] (0.8,1] (0,0.2] (0.4,0.6] (0.4,0.6] (0.8,1]
[22] (0,0.2] (0.8,1] (0.8,1] (0,0.2] (0.4,0.6] (0.2,0.4] (0.8,1]
[29] (0.4,0.6] (0.8,1] (0.6,0.8] (0.8,1] (0.2,0.4] (0.6,0.8] (0,0.2]
[36] (0.8,1] (0,0.2] (0.2,0.4] (0.8,1] (0.6,0.8] (0.2,0.4] (0.4,0.6]
[43] (0,0.2] (0.8,1] (0.4,0.6] (0.8,1] (0.8,1] (0.6,0.8] (0.8,1]
[50] (0.6,0.8]
Levels: (0,0.2] (0.2,0.4] (0.4,0.6] (0.6,0.8] (0.8,1]
しかし、なぜデータをそのように離散化する必要があるのか、それが理にかなっているかどうかを自問する必要があります。