I'm trying to write a table from a SQLite database into an R data frame and have hit upon a problem that has me stumped. Here are the three first entries in the SQLite table I would like to import:
1|10|0|0|0|0|10|10|0|0|0|6|8|6|20000|30000|2012-02-29 21:27:07.239091|2012-02-29 21:28:24.815385|6|80.67.28.161|||||||||||||||||||||||||||||||33|13.4936||t|t|f||||||||||||||||||4|0|0|7|7|2
2|10|0|0|0|0|0|0|0|2|2|4|5|4|20000|30000|2012-02-29 22:00:30.618726|2012-02-29 22:04:09.629942|5|80.67.28.161|3|7||0|1|3|0|||4|3|4|5|5|5|5|4|5|4|4|0|0|0|0|0|9|9|9|9|9|||1|f|t|f|||||||||||||k|text|l|||-13|0|3|10||2
3|13|2|4|4|4|4|1|1|2|5|6|3|2|40000|10000|2012-03-01 09:07:52.310033|2012-03-01 09:21:13.097303|6|80.67.28.161|2|2||30|1|1|0|||4|2|1|6|8|3|5|6|6|7|6|||||||||||26|13.6336|4|f|t|f|t|f|f|f|f|||||||||some text||||10|1|1|3|2|3
What I'm interested in are columns 53 through 60, which, to save you the trouble of counting in the above, look like this:
|t|t|f||||||
|f|t|f||||||
|f|t|f|t|f|f|f|f|
As you can see for the first two entries only the first three of those columns are not NULL while for the third entry all eight columns have values assigned to them.
Here's the SQLite table info for those columns
sqlite> PRAGMA table_info(observations);
0|id|INTEGER|1||1
** snip **
53|understanding1|boolean|0||0
54|understanding2|boolean|0||0
55|understanding3|boolean|0||0
56|understanding4|boolean|0||0
57|understanding5|boolean|0||0
58|understanding6|boolean|0||0
59|understanding7|boolean|0||0
60|understanding8|boolean|0||0
** snip **
Now, when I try to read this into R here's what those same columns end up becoming:
> library('RSQLite')
> con <- dbConnect("SQLite", dbname = 'db.sqlite3))
> obs <- dbReadTable(con,'observations')
> obs[1:3,names(obs) %in% paste0('understanding',1:8)]
understanding1 understanding2 understanding3 understanding4 understanding5 understanding6 understanding7
1 t t f NA NA NA NA
2 f t f NA NA NA NA
3 f t f 0 0 0 0
understanding8
1 NA
2 NA
3 0
As you can see, while the first three columns contain values that are either 't'
or 'f'
the other columns are NA
where the corresponding values in the SQLite table are NULL and 0
where they are not - irrespective of whether the corresponding values in the SQLite table are t
or f
. Needless to say this is not the behavior I expected. The problem is, I think, that these columns are typecasted incorrectly:
> sapply(obs[1:3,names(obs) %in% paste0('understanding',1:8)], class)
understanding1 understanding2 understanding3 understanding4 understanding5 understanding6 understanding7
"character" "character" "character" "numeric" "numeric" "numeric" "numeric"
understanding8
"numeric"
Could it be that RSQLite sets the first three columns to the character
type upon seeing t
and f
as values in the corresponding columns in the first entry but goes with numeric
because in these columns the first entry just happens to be NULL?
If this is indeed what's happening is there any way of working around this and casting all these columns into character
(or, even better, logical
)?