r - Rを使用してzip形式のデータファイルをダウンロードし、データを抽出してインポートします

Question

Twitterの@EZGraphsは次のように書いています。「多くのオンラインcsvが圧縮されています。Rを使用してアーカイブをダウンロードして解凍し、データをdata.frameにロードする方法はありますか？#Rstats」

今日もこれをやろうとしていましたが、zipファイルを手動でダウンロードするだけで終わりました。

私は次のようなことを試しました：

fileName <- "http://www.newcl.org/data/zipfiles/a1.zip"
con1 <- unz(fileName, filename="a1.dat", open = "r")

でも、遠く離れているような気がします。何かご意見は？

score 197 · Accepted Answer

Zip アーカイブは、実際には、コンテンツメタデータなどを含む「ファイルシステム」です。詳細については、を参照help(unzip)してください。したがって、上でスケッチしたことを行うには、

テンポを作成します。ファイル名 (例tempfile())
download.file()ファイルを一時にフェッチするために使用します。ファイル
unz()temp からターゲットファイルを抽出するために使用します。ファイル
一時ファイルを削除しますunlink()

コードでは（基本的な例に感謝しますが、これはより簡単です）次のようになります

temp <- tempfile()
download.file("http://www.newcl.org/data/zipfiles/a1.zip",temp)
data <- read.table(unz(temp, "a1.dat"))
unlink(temp)

圧縮 ( .z) または gzip された ( .gz) または bzip2ed ( .bz2) ファイルは単なるファイルであり、接続から直接読み取ることができます。したがって、代わりにそれを使用するようにデータプロバイダーを取得してください:)

score 29 · Accepted Answer

記録のために、Dirkの答えをコードに翻訳してみました:-P

temp <- tempfile()
download.file("http://www.newcl.org/data/zipfiles/a1.zip",temp)
con <- unz(temp, "a1.dat")
data <- matrix(scan(con),ncol=4,byrow=TRUE)
unlink(temp)

score 12 · Accepted Answer

Macの場合（そしてLinuxを想定しています）...

zip アーカイブに含まれるファイルが 1 つだけの場合は、bash コマンドfunzipをfreadfrom the data.tablepackageと組み合わせて使用できます。

library(data.table)
dt <- fread("curl http://www.newcl.org/data/zipfiles/a1.zip | funzip")

アーカイブに複数のファイルが含まれている場合、tar代わりに使用して特定のファイルを stdout に抽出できます。

dt <- fread("curl http://www.newcl.org/data/zipfiles/a1.zip | tar -xf- --to-stdout *a1.dat")

score 4 · Accepted Answer

このコードを試してください。わたしにはできる：

unzip(zipfile="<directory and filename>",
      exdir="<directory where the content will be extracted>")

例：

unzip(zipfile="./data/Data.zip",exdir="./data")

score 0 · Accepted Answer

rio()これには非常に適しています-ファイル名のファイル拡張子を使用してファイルの種類を判断するため、さまざまな種類のファイルで機能します。unzip()また、zip ファイル内のファイル名をリストするために使用したので、ファイル名を手動で指定する必要はありません。

library(rio)

# create a temporary directory
td <- tempdir()

# create a temporary file
tf <- tempfile(tmpdir=td, fileext=".zip")

# download file from internet into temporary location
download.file("http://download.companieshouse.gov.uk/BasicCompanyData-part1.zip", tf)

# list zip archive
file_names <- unzip(tf, list=TRUE)

# extract files from zip file
unzip(tf, exdir=td, overwrite=TRUE)

# use when zip file has only one file
data <- import(file.path(td, file_names$Name[1]))

# use when zip file has multiple files
data_multiple <- lapply(file_names$Name, function(x) import(file.path(td, x)))

# delete the files and directories
unlink(td)

r - Rを使用してzip形式のデータファイルをダウンロードし、データを抽出してインポートします

9 に答える 9

Related

Reference