機能選択のために属性をチェックしようとしています。そのために、 information.gain 、 gain.ratio 、カイ 2 乗を適用しましたが、一部の属性は NaN または 0.0000000 の値を与えています。
> weights <- information.gain(Team1.Result~., df)
> print(weights)
attr_importance
Sr..No. 0.000000000
Matchid 0.000000000
Team2 0.171564805
Margin 0.344871508
Toss 0.004552660
Bat 0.006355032
Ground 0.324758562
Date 0.674699370
Team1.BatRate 0.000000000
Team1.Bat_SR 0.000000000
Team1.BowlRate 0.144960767
Team1.Bowl_SR 0.000000000
Team2.BatRate 0.000000000
Team2.Bat_SR 0.000000000
Team2.BowlRate 0.161264860
Team2.Bowl_SR 0.161264860
ゲインレシオは
> weights <- gain.ratio(Team1.Result~., df)
> print(weights)
attr_importance
Sr..No. NaN
Matchid NaN
Team2 0.075884914
Margin 0.107668123
Toss 0.006675310
Bat 0.009171368
Ground 0.133481349
Date 0.175239871
Team1.BatRate NaN
Team1.Bat_SR NaN
Team1.BowlRate 0.266415653
Team1.Bowl_SR NaN
Team2.BatRate NaN
Team2.Bat_SR NaN
Team2.BowlRate 0.283865166
Team2.Bowl_SR 0.283865166
カイ二乗は
> res <- chi.squared(Team1.Result~., df)
> res
attr_importance
Sr..No. 0.0000000
Matchid 0.0000000
Team2 0.5168656
Margin 0.7149496
Toss 0.0951519
Bat 0.1125653
Ground 0.7022298
Date 1.0000000
Team1.BatRate 0.0000000
Team1.Bat_SR 0.0000000
Team1.BowlRate 0.4553474
Team1.Bowl_SR 0.0000000
Team2.BatRate 0.0000000
Team2.Bat_SR 0.0000000
Team2.BowlRate 0.4823412
Team2.Bowl_SR 0.4823412
データを表示するレコードの一部 (画像を追加したかったのですが、サイトで許可されていません)
Sr. No. Matchid Team2 Margin BR Toss Bat Ground Date Team1.BatRate Team1.Bat_SR Team1.BowlRate Team1.Bowl_SR Team2.BatRate Team2.Bat_SR Team2.BowlRate Team2.Bowl_SR Team1.Result
1 533280 New Zealand 13 runs NA 1 1 Pallekele 23-Sep-12 18.96866667 114.3413333 20.67066667 15.27333333 17.10866667 111.3693333 13.97666667 12.14666667 1
2 533283 Bangladesh 8 wickets 8 0 2 Pallekele 25-Sep-12 14.41333333 111.9113333 23.82466667 17.00666667 17.10866667 111.3693333 13.97666667 12.14666667 1
3 533286 South Africa 2 wickets 2 0 2 Colombo (RPS) 28-Sep-12 17.10866667 111.3693333 13.97666667 12.14666667 21.862 116.5413333 21.29266667 15.46 1
4 533291 India 8 wickets 18 1 1 Colombo (RPS) 30-Sep-12 22.37 104.772 25.52333333 19.29333333 17.10866667 111.3693333 13.97666667 12.14666667 0
5 533294 Australia 32 runs NA 0 1 Colombo (RPS) 2-Oct-12 18.36066667 114.2273333 22.80333333 18.42 17.10866667 111.3693333 13.97666667 12.14666667 1
6 533296 Sri Lanka 16 runs NA 0 2 Colombo (RPS) 4-Oct-12 17.10866667 111.3693333 13.97666667 12.14666667 15.936 100.616 15.75333333 13.16 0
7 562438 Sri Lanka 23 runs NA 1 1 Hambantota 3-Jun-12 14.425 98.111875 11.86875 10.33125 17.51142857 105.8635714 16.23214286 12.87857143 1
私には正しくないように見えるので、結果として NaN を使用しても問題ありませんか。また、カイ二乗の日付の場合のように、属性を 1 にすることはできますか?