3

データフレームがあり、それをnumpy配列に変換してその値をプロットしたいと考えています。データフレームは次のようになります。

>>> df_ohlc
                        open       high        low      close
Date                                                           
2018-03-07 03:35:00  62.189999  62.189999  62.169998  62.180000
2018-03-07 03:36:00  62.180000  62.180000  62.160000  62.180000
2018-03-07 03:37:00  62.169998  62.220001  62.169998  62.209999
2018-03-07 03:38:00  62.220001  62.220001  62.189999  62.200001
...
[480 rows x 4 columns]

>>> df_ohlc.index
DatetimeIndex(['2018-03-07 03:35:00', '2018-03-07 03:36:00',
            '2018-03-07 03:37:00', '2018-03-07 03:38:00',
            '2018-03-07 03:39:00', '2018-03-07 03:40:00',
            '2018-03-07 03:41:00', '2018-03-07 03:42:00',
            '2018-03-07 03:43:00', '2018-03-07 03:44:00',
            ...
            '2018-03-07 11:25:00', '2018-03-07 11:26:00',
            '2018-03-07 11:27:00', '2018-03-07 11:28:00',
            '2018-03-07 11:29:00', '2018-03-07 11:30:00',
            '2018-03-07 11:31:00', '2018-03-07 11:32:00',
            '2018-03-07 11:33:00', '2018-03-07 11:34:00'],
            dtype='datetime64[ns]', name='Date', length=480, freq='T')

>>> df_ohlc.index[0]
Timestamp('2018-03-07 03:35:00', freq='T')  # and why is it Timestamp when it said ```dtype=datetime64[ns]```` right before?

しかし、変換しようとすると、インデックスの種類 (日付列) が からdatetime64[ns]に変わりますTimestamp

>>> df_ohlc.reset_index().values
array([[Timestamp('2018-03-07 03:35:00'), 62.189998626708984,
        62.189998626708984, 62.16999816894531, 62.18000030517578],
    [Timestamp('2018-03-07 03:36:00'), 62.18000030517578,
        62.18000030517578, 62.15999984741211, 62.18000030517578],
    [Timestamp('2018-03-07 03:37:00'), 62.16999816894531,
        62.220001220703125, 62.16999816894531, 62.209999084472656],
    ..., 
    [Timestamp('2018-03-07 11:32:00'), 61.939998626708984,
        61.95000076293945, 61.93000030517578, 61.93000030517578],
    [Timestamp('2018-03-07 11:33:00'), 61.93000030517578,
        61.939998626708984, 61.900001525878906, 61.90999984741211],
    [Timestamp('2018-03-07 11:34:00'), 61.90999984741211,
        61.91999816894531, 61.900001525878906, 61.91999816894531]], dtype=object)

なぜそれが発生し、どのようにタイプをdatetime64のままにしておくことができますか?

データフレームのインデックスを分離して、後で値と連結しようとしましたが、エラーが表示されます。何を間違えたのか知りたいです。

>>> index_ohlc = np.array([ df_ohlc.index.values.astype('datetime64[s]'), ]).T

>>> index_ohlc.shape
(480, 1)

>>> value_ohlc = df_ohlc.values     

>>> value_ohlc.shape
(480, 4)

>>> type(index_ohlc)
<class 'numpy.ndarray'>

>>> type(value_ohlc)
<class 'numpy.ndarray'>

>>> new_array = np.concatenate( (index_ohlc, value_ohlc), axis = 1 )
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: invalid type promotion
4

1 に答える 1

1

structured_arraysを試してください。

デモ

from pandas import Timestamp
df = pd.DataFrame(np.array([[Timestamp('2018-03-07 03:35:00'), 62.189998626708984,
        62.189998626708984, 62.16999816894531, 62.18000030517578],
    [Timestamp('2018-03-07 03:36:00'), 62.18000030517578,
        62.18000030517578, 62.15999984741211, 62.18000030517578],
    [Timestamp('2018-03-07 03:37:00'), 62.16999816894531,
        62.220001220703125, 62.16999816894531, 62.209999084472656]]))
dt = np.dtype([("Date", 'datetime64[ns]'), 
               ("f1", np.float64), 
               ("f2", np.float64), 
               ("f3", np.float64), 
               ("f4", np.float64)])
arr = np.array([tuple(v) for v in df.values.tolist()], dtype=dt)

array([('2018-03-07T03:35:00.000000000', 62.18999863, 62.18999863, 62.16999817, 62.18000031),
       ('2018-03-07T03:36:00.000000000', 62.18000031, 62.18000031, 62.15999985, 62.18000031),
       ('2018-03-07T03:37:00.000000000', 62.16999817, 62.22000122, 62.16999817, 62.20999908)],
      dtype=[('Date', '<M8[ns]'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8')])
于 2018-03-07T15:26:05.433 に答える