列タイプを推測するための内部関数は、スキャンする行数に設定できます。しかし、read_excel()
それを実装していません (まだ?)。
以下の解決策は、デフォルトですべての行になるread_excel()
引数を使用して元の関数を書き直したものです。n_max
想像力の欠如のために、この拡張機能は と名付けられましread_excel2
た。
すべての行で列の型を評価するにはread_excel
、に置き換えるだけです。read_excel2
# Inspiration: https://github.com/hadley/readxl/blob/master/R/read_excel.R
# Rewrote read_excel() to read_excel2() with additional argument 'n_max' for number
# of rows to evaluate in function readxl:::xls_col_types and
# readxl:::xlsx_col_types()
# This is probably an unstable solution, since it calls internal functions from readxl.
# May or may not survive next update of readxl. Seems to work in version 0.1.0
library(readxl)
read_excel2 <- function(path, sheet = 1, col_names = TRUE, col_types = NULL,
na = "", skip = 0, n_max = 1050000L) {
path <- readxl:::check_file(path)
ext <- tolower(tools::file_ext(path))
switch(readxl:::excel_format(path),
xls = read_xls2(path, sheet, col_names, col_types, na, skip, n_max),
xlsx = read_xlsx2(path, sheet, col_names, col_types, na, skip, n_max)
)
}
read_xls2 <- function(path, sheet = 1, col_names = TRUE, col_types = NULL,
na = "", skip = 0, n_max = n_max) {
sheet <- readxl:::standardise_sheet(sheet, readxl:::xls_sheets(path))
has_col_names <- isTRUE(col_names)
if (has_col_names) {
col_names <- readxl:::xls_col_names(path, sheet, nskip = skip)
} else if (readxl:::isFALSE(col_names)) {
col_names <- paste0("X", seq_along(readxl:::xls_col_names(path, sheet)))
}
if (is.null(col_types)) {
col_types <- readxl:::xls_col_types(
path, sheet, na = na, nskip = skip, has_col_names = has_col_names, n = n_max
)
}
readxl:::xls_cols(path, sheet, col_names = col_names, col_types = col_types,
na = na, nskip = skip + has_col_names)
}
read_xlsx2 <- function(path, sheet = 1L, col_names = TRUE, col_types = NULL,
na = "", skip = 0, n_max = n_max) {
path <- readxl:::check_file(path)
sheet <-
readxl:::standardise_sheet(sheet, readxl:::xlsx_sheets(path))
if (is.null(col_types)) {
col_types <-
readxl:::xlsx_col_types(
path = path, sheet = sheet, na = na, nskip = skip + isTRUE(col_names), n = n_max
)
}
readxl:::read_xlsx_(path, sheet, col_names = col_names, col_types = col_types, na = na,
nskip = skip)
}
この拡張された推測により、パフォーマンスに悪影響を及ぼす可能性があります。本当に大きなデータセットはまだ試していません。機能を検証するのに十分な小さなデータを試しただけです。