1

I'm trying to write a table from a SQLite database into an R data frame and have hit upon a problem that has me stumped. Here are the three first entries in the SQLite table I would like to import:

1|10|0|0|0|0|10|10|0|0|0|6|8|6|20000|30000|2012-02-29 21:27:07.239091|2012-02-29 21:28:24.815385|6|80.67.28.161|||||||||||||||||||||||||||||||33|13.4936||t|t|f||||||||||||||||||4|0|0|7|7|2
2|10|0|0|0|0|0|0|0|2|2|4|5|4|20000|30000|2012-02-29 22:00:30.618726|2012-02-29 22:04:09.629942|5|80.67.28.161|3|7||0|1|3|0|||4|3|4|5|5|5|5|4|5|4|4|0|0|0|0|0|9|9|9|9|9|||1|f|t|f|||||||||||||k|text|l|||-13|0|3|10||2
3|13|2|4|4|4|4|1|1|2|5|6|3|2|40000|10000|2012-03-01 09:07:52.310033|2012-03-01 09:21:13.097303|6|80.67.28.161|2|2||30|1|1|0|||4|2|1|6|8|3|5|6|6|7|6|||||||||||26|13.6336|4|f|t|f|t|f|f|f|f|||||||||some text||||10|1|1|3|2|3

What I'm interested in are columns 53 through 60, which, to save you the trouble of counting in the above, look like this:

|t|t|f||||||
|f|t|f||||||
|f|t|f|t|f|f|f|f|

As you can see for the first two entries only the first three of those columns are not NULL while for the third entry all eight columns have values assigned to them.

Here's the SQLite table info for those columns

sqlite> PRAGMA table_info(observations);
0|id|INTEGER|1||1
** snip **
53|understanding1|boolean|0||0
54|understanding2|boolean|0||0
55|understanding3|boolean|0||0
56|understanding4|boolean|0||0
57|understanding5|boolean|0||0
58|understanding6|boolean|0||0
59|understanding7|boolean|0||0
60|understanding8|boolean|0||0
** snip **

Now, when I try to read this into R here's what those same columns end up becoming:

> library('RSQLite')
> con <- dbConnect("SQLite", dbname = 'db.sqlite3))
> obs <- dbReadTable(con,'observations')
> obs[1:3,names(obs) %in% paste0('understanding',1:8)]
  understanding1 understanding2 understanding3 understanding4 understanding5 understanding6 understanding7
1              t              t              f             NA             NA             NA             NA
2              f              t              f             NA             NA             NA             NA
3              f              t              f              0              0              0              0
  understanding8
1             NA
2             NA
3              0

As you can see, while the first three columns contain values that are either 't' or 'f' the other columns are NA where the corresponding values in the SQLite table are NULL and 0 where they are not - irrespective of whether the corresponding values in the SQLite table are t or f. Needless to say this is not the behavior I expected. The problem is, I think, that these columns are typecasted incorrectly:

> sapply(obs[1:3,names(obs) %in% paste0('understanding',1:8)], class)
understanding1 understanding2 understanding3 understanding4 understanding5 understanding6 understanding7 
   "character"    "character"    "character"      "numeric"      "numeric"      "numeric"      "numeric" 
understanding8 
     "numeric" 

Could it be that RSQLite sets the first three columns to the character type upon seeing t and f as values in the corresponding columns in the first entry but goes with numeric because in these columns the first entry just happens to be NULL?

If this is indeed what's happening is there any way of working around this and casting all these columns into character (or, even better, logical)?

4

1 に答える 1

0

以下はハックですが、機能します。

# first make a copy of the DB and work with it instead of changing
# data in the original
original_file <- "db.sqlite3"
copy_file <- "db_copy.sqlite3"
file.copy(original_file, copy_file) # duplicate the file
con <- dbConnect("SQLite", dbname = copy_file) # establish a connection to the copied DB

# put together a query to replace all NULLs by 'NA' and run it
columns <- c(paste0('understanding',1:15))
columns_query <- paste(paste0(columns,' = IfNull(',columns,",'NA')"),collapse=",")
query <- paste0("UPDATE observations SET ",columns_query)
dbSendQuery(con, query)

# Now that all columns have string values RSQLite will infer the 
# column type to be `character`
df <- dbReadTable(con,'observations') # read the table
file.remove(copy_file) # delete the copy

# replace all 'NA' strings with proper NAs
df[names(df) %in% paste0('understanding',1:15)][df[names(df) %in% paste0('understanding',1:15)] == 'NA'] <- NA
# convert 't' to boolean TRUE and 'f' to boolean FALSE
df[ ,names(df) %in% paste0('understanding',1:15)] <- sapply( df[ ,names(df) %in% paste0('understanding',1:15)], function(x) {x=="t"} )
于 2012-10-18T01:13:33.860 に答える