r - R重複'row.names'は許可されていません

Question

ここでRの私の問題：

mtable <- read.table(paste(".folder_1362704682.4574","/groups.txt",sep=""),sep="\t",comment.char='',skip=0, header=TRUE, fill=TRUE,check.names=FALSE)

最初のフォルダ部分またはpaste（）は通常、デバッグ目的->静的のためにvarでラップされます。

私はいつもメッセージを受け取ります：

Error in read.table(paste(".frunc_1362704682.4574", "/groups.txt", sep = ""),  :
  duplicate 'row.names' are not allowed

しかし、このヘッダーのあるファイルを見ると、次のようになります。

root_node_name  node_name       node_id #genes_in_root_node     #genes_in_node  #genes_with_variable=1_in_root_node     #genes_with_variable=1_in_node  raw_p_underrepresentation_of_variable=1 raw_p_overrepresentation_      of_variable=1  FWER_underrepresentation        FWER_overrepresentation FDR_underrepresentation FDR_overrepresentation

私は重複を見ることができません..:(私は試してみるべきだということについて別の議論で読んだ：

mtable <- read.table(paste(".frunc_1362704682.4574","/groups.txt",sep=""),sep="\t",comment.char='',skip=0, header=TRUE, fill=TRUE,check.names=FALSE,**row.names=NULL**)

これはうまく機能しますが、その後、すべての見出しが1列右にシフトされます。

> head(mtable, n=1)
           row.names                            root_node_name  node_name
1 molecular_function trans-hexaprenyltranstransferase activity GO:0000010
  node_id #genes_in_root_node #genes_in_node
1   17668                   2           2419
  #genes_with_variable=1_in_root_node #genes_with_variable=1_in_node
1                                   0                        0.74491
  raw_p_underrepresentation_of_variable=1
1                                       1
  raw_p_overrepresentation_of_variable=1 FWER_underrepresentation
1                                      1                        1
  FWER_overrepresentation FDR_underrepresentation FDR_overrepresentation
1

それを正しくするためのアイデアはありますか？:(

編集：

コメンテアが言ったように、これは主にthr行の問題です..私はそれがヘッダーから来ていると思ったので愚かです。しかし、私は行に名前を付けたくありません、それはそれらを簡単に読むべきです... oOはそれほど難しいことはできませんか？

ファイルの内容：

molecular_function      trans-hexaprenyltranstransferase activity       GO:0000010      17668   2       2419    0       0.74491 1       1       1       -1      -1
molecular_function      single-stranded DNA specific endodeoxyribonuclease activity     GO:0000014      17668   5       2419    0       0.478885        1       1       1       -1      -1
molecular_function      lactase activity        GO:0000016      17668   1       2419    0       0.863086        1       1       1       -1      -1
molecular_function      alpha-1,3-mannosyltransferase activity  GO:0000033      17668   3       2419    0       0.64291 1       1       1       -1      -1
molecular_function      tRNA binding    GO:0000049      17668   27      2419    7       0.975698        0.0663832       1       1       -1      -1
molecular_function      fatty-acyl-CoA binding  GO:0000062      17668   20      2419    6       0.986407        0.0460431       1       1       -1      -1
molecular_function      L-ornithine transmembrane transporter activity  GO:0000064      17668   1       2419    0       0.863086        1       1       1       -1      -1
molecular_function      S-adenosylmethionine transmembrane transporter activity GO:0000095      17668   1       2419    0       0.863086        1       1       1       -1      -1

score 11 · Accepted Answer

ここのRドキュメントによると、

If there is a header and the first row contains one fewer field 
than the number of columns, the first column in the input is used
for the row names. Otherwise if row.names is missing, the rows are numbered.

...したがって、最初の行のフィールドは列数より1つ少ない可能性があるため、行名としてread.table()最初の列（複数のコピーを含むmolecular_function）を選択することをお勧めします。

score 1 · Accepted Answer

@adrianoeschによるここでの回答（https://stackoverflow.com/a/22408965/2236315 ）が役立つはずです。

テキストエディタで開くと、ヘッダーフィールドの数がヘッダー行の下の列の数より少ないことがわかります。私の場合、データセットの最後のヘッダーフィールドの最後に「、」がありませんでした。

score 0 · Accepted Answer

同じ問題が発生しました。問題は、テキストファイルの下部にある1トンの表形式の空白でした。したがって、すべての行名はこれらの行で同じでした（つまり空白でした）。このように私がExcelから変換したために発生しました。

score 0 · Accepted Answer

ヘッダー以外の1つの列が空になるデータファイルを自動的に生成しました。各ファイルを個別に編集する必要はありません（そしてそれを汚すリスクがあります）。私が見つけた最善の回避策は、引数に「row.names = NULL」を含めるという質問＃4066607でした。

DF<-read.csv(file, ..... , row.names=NULL)

これは完璧ではありませんが、ファイルをロードできます。他の回答で説明されている動作（行番号の余分な列の追加を強制する）とは異なり、「row.names」というラベルの付いた元の最初の列が取得され、すべてのヘッダーが1列右にシフトされます。のすべてのデータを取得します。

r - R重複'row.names'は許可されていません

4 に答える 4

Related

Reference