r - このデータテーブルを dplyr で要約し、結果に対して chisq.test (または同様のもの) を実行し、すべてを 1 つのきちんとした関数にループするにはどうすればよいでしょうか?

Question

この質問は、私がここで尋ねた別の質問に埋め込まれていましたが、最初の問い合わせで知りたかったことの範囲を超えているため、別のスレッドに値する可能性があると考えました.

こことここで受け取った回答とKhashaaとJaapdplyrによって書かれた関数に基づいて、この問題の解決策を考え出そうとしています。

(特に Jaap から) 提供されたソリューションを使用して、受け取った生データをマトリックスのようなデータテーブルにまとめることができました。

dput(SO_Example_v1)
structure(list(Type = structure(c(3L, 1L, 2L), .Label = c("Community", 
"Contaminant", "Healthcare"), class = "factor"), hosp1_WoundAssocType = c(464L, 
285L, 24L), hosp1_BloodAssocType = c(73L, 40L, 26L), hosp1_UrineAssocType = c(75L, 
37L, 18L), hosp1_RespAssocType = c(137L, 77L, 2L), hosp1_CathAssocType = c(80L, 
34L, 24L), hosp2_WoundAssocType = c(171L, 115L, 17L), hosp2_BloodAssocType = c(127L, 
62L, 12L), hosp2_UrineAssocType = c(50L, 29L, 6L), hosp2_RespAssocType = c(135L, 
142L, 6L), hosp2_CathAssocType = c(95L, 24L, 12L)), .Names = c("Type", 
"hosp1_WoundAssocType", "hosp1_BloodAssocType", "hosp1_UrineAssocType", 
"hosp1_RespAssocType", "hosp1_CathAssocType", "hosp2_WoundAssocType", 
"hosp2_BloodAssocType", "hosp2_UrineAssocType", "hosp2_RespAssocType", 
"hosp2_CathAssocType"), class = "data.frame", row.names = c(NA, 
-3L))

次のようになります

require(dplyr)
df <- tbl_df(SO_Example_v1)
head(df)
         Type hosp1_WoundAssocType hosp1_BloodAssocType hosp1_UrineAssocType
1  Healthcare                  464                   73                   75
2   Community                  285                   40                   37
3 Contaminant                   24                   26                   18
Variables not shown: hosp1_RespAssocType (int), hosp1_CathAssocType (int), hosp2_WoundAssocType
  (int), hosp2_BloodAssocType (int), hosp2_UrineAssocType (int), hosp2_RespAssocType (int),
  hosp2_CathAssocType (int)

列Typeは細菌の種類で、次の列はそれらが培養された場所を表します。数字は、それぞれの種類の細菌が検出された回数を表します。

私は最終的なテーブルがどのように見えるべきかを知っていますが、今までは各比較と変数に対して段階的にそれを行ってきdplyrました.これにSOで答えてください。

最終テーブルの例

                                                 Wound
Type                            n Hospital 1 (%)      n Hospital 2 (%)  p-val
Healthcare associated bacteria     464 (60.0)            171 (56.4)     0.28
Community associated bacteria      285 (36.9)            115 (38.0)     0.74
Contaminants                       24 (3.1)              17 (5.6)       0.05

最初のグループ化変数「創傷」は、続いて「尿」、「呼吸器」などに置き換えられます...そして、「すべて/合計」という名前の最後の列があります。これは、行の各変数の合計回数です「タイプ」を見つけて、病院 1 と 2 でまとめて比較しました。

私がこれまで行ってきたことは、次のような非常に面倒なことです。これは「手で」計算され、すべての結果を手動でテーブルに入力するためです。

### Wound cultures & healthcare associated (extracted manually)
# hosp1 464 (yes), 309 (no), 773 wound isolates in total; (% = 464 / 309 * 100)
# hosp2 171 (yes), 132 (no), 303 would isolates in total; (% = 171 / 303 * 100)

### Then the chisq.test of my contingency table
chisq.test(cbind(c(464,309),c(171,132)),correct=FALSE)

生のdata.frameでパイプを実行するdplyrと、目的のテーブルの正確なフォーマットを取得できないことを理解していますが、少なくともここでのすべての手順を自動化し、結果を. .csv ファイルとしてエクスポートし、最終的な列の編集などを行うことができる最終的なテーブルですか?

どんな助けでも大歓迎です。

score 3 · Accepted Answer

それは醜いですが、うまくいきます（コメントのSamは、分析する前にデータをきれいな形式に調整することでおそらくこの問題全体に対処する必要があると言っていますが、とにかく）：

Map(
  function(x,y) {
    out <- cbind(x,y)
    final <- rbind(out[1,],colSums(out[2:3,]))
    chisq.test(final,correct=FALSE)
  },
  SO_Example_v1[grepl("^hosp1",names(SO_Example_v1))],
  SO_Example_v1[grepl("^hosp2",names(SO_Example_v1))] 
)

#$hosp1_WoundAssocType
#
#        Pearson's Chi-squared test
#
#data:  final
#X-squared = 1.16, df = 1, p-value = 0.2815
# etc etc...

意図した結果と一致します:

chisq.test(cbind(c(464,309),c(171,132)),correct=FALSE)
#
#        Pearson's Chi-squared test
# 
#data:  cbind(c(464, 309), c(171, 132))
#X-squared = 1.16, df = 1, p-value = 0.2815

r - このデータ テーブルを dplyr で要約し、結果に対して chisq.test (または同様のもの) を実行し、すべてを 1 つのきちんとした関数にループするにはどうすればよいでしょうか?

1 に答える 1

Related

Reference

r - このデータテーブルを dplyr で要約し、結果に対して chisq.test (または同様のもの) を実行し、すべてを 1 つのきちんとした関数にループするにはどうすればよいでしょうか?