r - Pythonで解凍する代わりに、Rでバイナリデータを読み取ります

Question

私はPythonを8か月間学びました、Rの初心者、バイナリファイルがあります
。バイナリデータを読み取ってリストに変更できます（Pythonでは、配列はリストです）。
データファイル（名前はtest）は次の場所にあります：
https
：//www.box.com/s/0g3qg2lqgmr7y7fk5aut 構造は次のとおりです
。4バイトごとに整数なので、Pythonでunpackして読み取るには

import struct
datafile=open('test','rb')
data=datafile.read(32)
result=[]
while  data:
    result.append(list(struct.unpack('iiiiiiii',data)))
    data=datafile.read(32)

Rでバイナリデータを読み取るにはどうすればよいですか？

私はPaulHiemstraの助けを借りてRでコードを完成させました。

datafile="test"
totalsize=file.info(datafile)$size
lines=totalsize/32
data=readBin("test",integer(),n=totalsize,size=4,endian="little")
result=data.frame(matrix(data,nrow=lines,ncol=8,byrow=TRUE))
colnames(result)=c(date,"x1","x2","x3","x4","x5","x6","x7")

私が解決したい問題がまだあります。ここで、n = totalsizeですべてのデータを完全に読み取ります。データが巨大な場合、メモリが十分に含まれていません。表現方法：1001バイトから2000バイトまでのデータを読み取るには？n = 1000の場合は1〜1000のデータを読み取ることを意味し、n = 2000の場合は1〜2000のデータを読み取ることを意味しますが、1001〜2000のデータを読み取るのはどうでしょうか。Rにファイルポインタはありますか？1000番目のバイナリデータを読み取ると、ファイルポインタは1000番目の位置にあり、コマンドreadBin（ "test"、integer（）、n = 1000、size = 4、endian = "little" ）1001から2000までのデータを読み取るには？

score 6 · Accepted Answer

グーグルで検索すると、最初の結果として次のリンクR read binary fileが生成されます。肝心なのは、この関数を使用することです。この場合、次のようになります。readBin

file2read = file("test", "rb")
number_of_integers_in_file = 128
spam = readBin(file2read, integer(), number_of_integers_in_file, size = 4)
close(file2read)

ファイル内の整数の数がわからない場合は、いくつかのことを実行できます。最初にサンプルファイルを作成します。

# Create a binary file that we can read
l = as.integer(1:10)
file2write = file("/tmp/test", "wb")
writeBin(l, file2write)
close(file2write)

1つの戦略は、読み取る整数の数を過大評価することです。readBinは、実際に存在する数のみを返します。のサイズのベクトルnは事前に割り当てられているため、これを大きくしすぎないように注意してください。

file2read = file("/tmp/test", "rb")
l_read = readBin(file2read, integer(), n = 100)
close(file2read)
all.equal(l, l_read)
[1] TRUE

または、数値のサイズ（4バイトなど）がわかっている場合は、次の関数を使用して、存在する数値を計算できます。

number_of_numbers = function(path, size = 4) {
  # If path is a file connection, extract file name
  if(inherits(path, "file")) path = summary(path)[["description"]]
  return(file.info(path)[["size"]] / size)
 }
number_of_numbers("/tmp/test")
[1] 10

動作中：

file2read = file("/tmp/test", "rb")
l_read2 = readBin(file2read, integer(), n = number_of_numbers(file2read))
close(file2read)
all.equal(l, l_read2)   
[1] TRUE

データ量が多すぎてメモリに収まらない場合は、チャンクで読み取ることをお勧めします。readBinこれは、たとえば、への連続呼び出しを使用して実行できます。

first_1000 = readBin(con, integer(), n = 1000)
next_1000 = readBin(con, integer(), n = 1000)

データファイルの一部、たとえば最初の1000の数値をスキップする場合は、seek関数を使用します。これは、1000個の数値を読み取り、それらを破棄して、次の1000個の数値を読み取るよりもはるかに高速です。例えば：

# Skip the first thousand 4 byte integers
seek(con, where = 4*1000)
next_1000 = readBin(con, integer(), n = 1000)

r - Pythonで解凍する代わりに、Rでバイナリデータを読み取ります

私はPaulHiemstraの助けを借りてRでコードを完成させました。

1 に答える 1

Related

Reference