3

Infopath .XML フォームを R にインポートし、データフレームに変換する最良の方法は何ですか? Excel で Infopath .XML ファイルを開くと、データ フレームの行と列が正しく表示されます。

XMLパッケージを使用してRで試したことは次のとおりです。

  1. XMLファイルを解析するためにxmlParse()を使用しました
  2. xmlToDataFrame() を使用して、解析された XML ファイルをデータ フレームに変換しようとしました。

ただし、手順 2 で、次のエラーが表示されます。

Error in `[<-.data.frame`(`*tmp*`, i, names(nodes[[i]]), value = c("touch your head13011000",  : 
  duplicate subscripts for columns

ただし、XML ファイルを Excel で開くと、列が重複しているようには見えません。この XML ファイルを Infopath から R のデータ フレームに変換するにはどうすればよいですか? 予想される列は次のようになります (Excel に表示されるとおり)。

TCID, DateCoded, tcAge, T1_B3, T1_B2, T1_B1, T1_B0, T1_A3, T1_A2, T1_A1, T1_A0, T1_DelayTotal, T2_A3, T2_A2, T2_A1, T2_A, T2_B3, T2_B2, T2_B1, T2_B0, T2_DelayTotal, Coder, notes_t1, note_t2, bachildpres30, baparpres30, bapassptgo, bapassptnogo, bamissgame, P1_B3, P1_B2, P1_B1, P1_B0, P1_A3, P1_A2, P1_A1, P1_A0, P1_DelayTotal, P1_action, P1_go-nogo, P1_score, P1_delay, P1_trial, P1_Ecommand, P1_imitation, P1_restraint, P1_ruleswitch, P1_trials, P1_gotrials, P1_nogotrials, T1_gotrials, T1_nogotrials, T1_trials, T2_gotrials, T2_nogotrials, T2_trials, P1_notplay, T1_trial, T1_go-nogo, T1_score, T1_delay, T1_action, T2_trial, T2_go-nogo, T2_score, T2_delay, T2_action

XML ファイルに複数回出現する変数については、長い形式 (つまり、同じ変数に対して複数の行) でデータ フレームに配置したいと考えています。私は XML ファイルを扱った経験があまりないため、ご協力をお願いいたします。

xmlParse を使用すると、R で解析された XML ファイルは次のようになります。

<my:myFields lang="en-us" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:st="urn:schemas-microsoft-com:office:smarttags" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2009-07-01T18:12:59" xmlns:xd="http://schemas.microsoft.com/office/infopath/2003">
 <my:SPSS>
  <my:TCID>10</my:TCID>
  <my:DateCoded>2013-04-01</my:DateCoded>
  <my:tcAge>30</my:tcAge>
  <my:T1_B3>6</my:T1_B3>
  <my:T1_B2>0</my:T1_B2>
  <my:T1_B1>0</my:T1_B1>
  <my:T1_B0>0</my:T1_B0>
  <my:T1_A3>0</my:T1_A3>
  <my:T1_A2>0</my:T1_A2>
  <my:T1_A1>1</my:T1_A1>
  <my:T1_A0>5</my:T1_A0>
  <my:T1_DelayTotal>1</my:T1_DelayTotal>
  <my:T2_A3 nil="true"/>
  <my:T2_A2 nil="true"/>
  <my:T2_A1 nil="true"/>
  <my:T2_A0 nil="true"/>
  <my:T2_B3 nil="true"/>
  <my:T2_B2 nil="true"/>
  <my:T2_B1 nil="true"/>
  <my:T2_B0 nil="true"/>
  <my:T2_DelayTotal nil="true"/>
  <my:Coder>Name</my:Coder>
 </my:SPSS>
 <my:notes_t1/>
 <my:note_t2/>
 <my:bachildpres30>0</my:bachildpres30>
 <my:baparpres30>0</my:baparpres30>
 <my:bapassptgo>1</my:bapassptgo>
 <my:bapassptnogo>0</my:bapassptnogo>
 <my:bamissgame>0</my:bamissgame>
 <my:P1_B3>4</my:P1_B3>
 <my:P1_B2>0</my:P1_B2>
 <my:P1_B1>0</my:P1_B1>
 <my:P1_B0>1</my:P1_B0>
 <my:P1_A3>0</my:P1_A3>
 <my:P1_A2>0</my:P1_A2>
 <my:P1_A1>1</my:P1_A1>
 <my:P1_A0>3</my:P1_A0>
 <my:P1_DelayTotal>0</my:P1_DelayTotal>
 <my:group2>
  <my:group3>
   <my:P1_action>touch your head</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>3</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>1</my:P1_trial>
   <my:P1_Ecommand>1</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your nose</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>3</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>2</my:P1_trial>
   <my:P1_Ecommand>1</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your tummy</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>3</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>3</my:P1_trial>
   <my:P1_Ecommand>1</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your head</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>0</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>4</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your head</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>3</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>5</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your nose</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>3</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>6</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>clap your hands</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>3</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>7</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your nose</my:P1_action>
   <my:P1_go-nogo>0</my:P1_go-nogo>
   <my:P1_score>0</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>8</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your ears</my:P1_action>
   <my:P1_go-nogo>0</my:P1_go-nogo>
   <my:P1_score>0</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>9</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your tummy</my:P1_action>
   <my:P1_go-nogo>0</my:P1_go-nogo>
   <my:P1_score>0</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>10</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your eyes</my:P1_action>
   <my:P1_go-nogo>0</my:P1_go-nogo>
   <my:P1_score>1</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>11</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>1</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your eyes</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>3</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>12</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
 </my:group2>
 <my:P1_trials>9</my:P1_trials>
 <my:P1_gotrials>5</my:P1_gotrials>
 <my:P1_nogotrials>4</my:P1_nogotrials>
 <my:T1_gotrials>6</my:T1_gotrials>
 <my:T1_nogotrials>6</my:T1_nogotrials>
 <my:T1_trials>12</my:T1_trials>
 <my:T2_gotrials>0</my:T2_gotrials>
 <my:T2_nogotrials>0</my:T2_nogotrials>
 <my:T2_trials>0</my:T2_trials>
 <my:P1_notplay/>
 <my:group4>
  <my:group5>
   <my:T1_trial>1</my:T1_trial>
   <my:T1_go-nogo>1</my:T1_go-nogo>
   <my:T1_score>3</my:T1_score>
   <my:T1_delay>1</my:T1_delay>
   <my:T1_action>Touch your tongue</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>2</my:T1_trial>
   <my:T1_go-nogo>1</my:T1_go-nogo>
   <my:T1_score>3</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your teeth</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>3</my:T1_trial>
   <my:T1_go-nogo>0</my:T1_go-nogo>
   <my:T1_score>0</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your ear</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>4</my:T1_trial>
   <my:T1_go-nogo>1</my:T1_go-nogo>
   <my:T1_score>3</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Clap your hands</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>5</my:T1_trial>
   <my:T1_go-nogo>0</my:T1_go-nogo>
   <my:T1_score>0</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Clap your hands</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>6</my:T1_trial>
   <my:T1_go-nogo>0</my:T1_go-nogo>
   <my:T1_score>0</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your eyes</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>7</my:T1_trial>
   <my:T1_go-nogo>0</my:T1_go-nogo>
   <my:T1_score>0</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your feet</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>8</my:T1_trial>
   <my:T1_go-nogo>1</my:T1_go-nogo>
   <my:T1_score>3</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your nose</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>9</my:T1_trial>
   <my:T1_go-nogo>0</my:T1_go-nogo>
   <my:T1_score>1</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your nose</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>10</my:T1_trial>
   <my:T1_go-nogo>1</my:T1_go-nogo>
   <my:T1_score>3</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your tummy</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>11</my:T1_trial>
   <my:T1_go-nogo>0</my:T1_go-nogo>
   <my:T1_score>0</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Wave your hand</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>12</my:T1_trial>
   <my:T1_go-nogo>1</my:T1_go-nogo>
   <my:T1_score>3</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your head</my:T1_action>
  </my:group5>
 </my:group4>
 <my:group6>
  <my:group7>
   <my:T2_trial>1</my:T2_trial>
   <my:T2_go-nogo>0</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your tongue</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>2</my:T2_trial>
   <my:T2_go-nogo>0</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your teeth</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>3</my:T2_trial>
   <my:T2_go-nogo>1</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your ear</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>4</my:T2_trial>
   <my:T2_go-nogo>0</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Clap your hands</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>5</my:T2_trial>
   <my:T2_go-nogo>1</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Clap your hands</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>6</my:T2_trial>
   <my:T2_go-nogo>1</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your eyes</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>7</my:T2_trial>
   <my:T2_go-nogo>1</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your feet</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>8</my:T2_trial>
   <my:T2_go-nogo>0</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your nose</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>9</my:T2_trial>
   <my:T2_go-nogo>1</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your nose</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>10</my:T2_trial>
   <my:T2_go-nogo>0</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your tummy</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>11</my:T2_trial>
   <my:T2_go-nogo>1</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Wave your hand</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>12</my:T2_trial>
   <my:T2_go-nogo>0</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your head</my:T2_action>
  </my:group7>
 </my:group6>
</my:myFields>
4

1 に答える 1

6

In my experience, xmlToDataFrame only works when the XML is already structured in a very consistent way. The data you are working with is structured in many different ways:

# Assuming you've already read your data into a character vector called `xml_file    
xml_file <- xmlParse(xml_file)
xml_file <- xmlToList(xml_file)

stack(sapply(xml_file, length))
   values           ind
1      22          SPSS
2       0      notes_t1
3       0       note_t2
4       1 bachildpres30
5       1   baparpres30
6       1    bapassptgo
7       1  bapassptnogo
8       1    bamissgame
9       1         P1_B3
10      1         P1_B2
11      1         P1_B1
12      1         P1_B0
13      1         P1_A3
14      1         P1_A2
15      1         P1_A1
16      1         P1_A0
17      1 P1_DelayTotal
18     12        group2
19      1     P1_trials
20      1   P1_gotrials
21      1 P1_nogotrials
22      1   T1_gotrials
23      1 T1_nogotrials
24      1     T1_trials
25      1   T2_gotrials
26      1 T2_nogotrials
27      1     T2_trials
28      0    P1_notplay
29     12        group4
30     12        group6
31      1        .attrs

So most of your nodes contain a single value. A few are empty. The "SPSS" node contains 22 values, all with different names, and "group2", "group4", and "group6" all contain 12 nodes, each of which contains multiple values, but whose values are similar across nodes. When I looked at what Excel did when I imported the file, it stacked the 12-node components on top of each other, then strung all 22 of the "SPSS" components together with all of the single-value nodes and repeated that string for as many rows as had been created from stacking the 12-node components, and then bound the two pieces together in columns.

To do that, separate out the long string from the 12-node chunks:

xml_file_singles <- xml_file[sapply(xml_file, length) != 12]
xml_file_singles[sapply(xml_file_singles, length) == 0] <- NA
xml_file_singles <- unlist(xml_file_singles)

xml_file_multiples <- xml_file[sapply(xml_file, length) == 12]

Now take the 12-node chunks and turn each chunk into a data frame:

xml_file_multiples <- lapply(1:length(xml_file_multiples), function(i) {

  x <- lapply(xml_file_multiples[[i]], function(y) {
    data.frame(as.list(unlist(y)), stringsAsFactors = FALSE)})
  x <- do.call("rbind", x)
  cbind("group" = names(xml_file_multiples)[i], x)
})

Now use the plyr package's rbind.fill function to put all the new data frames together:

require(plyr)

xml_file_multiples <- do.call("rbind.fill", xml_file_multiples)

Now cbind your long string of values to your bound dataframes:

xml_final <- cbind(as.list(xml_file_singles), xml_file_multiples, 
  stringsAsFactors = FALSE)

This approach, like Excel's, introduces a whole lot of NAs, because the column names for your different 12-node chunks all differ slightly. If you did this before calling rbind.fill:

xml_file_multiples <- lapply(1:length(xml_file_multiples), function(i) {

  x <- lapply(xml_file_multiples[[i]], function(y) {
    data.frame(as.list(unlist(y)), stringsAsFactors = FALSE)})
  x <- do.call("rbind", x)
  x <- cbind("group" = names(xml_file_multiples)[i], x)
  colnames(x) <- gsub("^\\w\\d_", "", colnames(x))
  x
})

You would generate fewer NA's because you would generate fewer redundant columns, but then you would have to rely on the values in the "group" column to keep track of which rows had originally appeared in which node.

于 2013-05-16T21:24:49.883 に答える