0

I am using an older version of pandas (0.19) / python (3.6.4) to run a large data processing script that can take a long time time to run. One big time sink is my I/O method which is reading / writing .csv files for large files (~ >100-300 MB in many cases) using pandas read_csv and to_csv.

I was working on speeding up the script and came upon .hdf5 and found that pandas to_hdf and read_hdf is ~ 20x faster than the equivalent csv methods for my files. They work well in pandas (the write / read fidelity seems good to me). This would be a huge help to me.

BUT, I often visualize outputs in JMP and I can't figure out how to easily open h5 in JMP. When I do, I see multiple tables and often if I have string data columns they show up in their own table or else don't seem to be imported in a way that I can find the actual strings.

Any suggestions?

4

0 に答える 0