python - 2D DataFrames/Arrays の Dict から DataArray を作成

Question

レパートリーを広げるため PandasにXarrayforから forに移行しようとしています。N-Dimensional DataArrays

現実的には、特定の軸 (以下のモックの例では患者) に沿ってさまざまなpd.DataFrames(この場合は行 = 月、列 = 属性) を持ち、それらをマージします (パネルやマルチインデックスを使用せずに) ：）、ありがとうございました）。xr.DataArraysそれらを変換して、それらに次元を構築できるようにしたいと考えています。私が話していることの要点を示すために、モックデータセットを作成しました。

私が作成したこのデータセットについて100 patients, 12 months, 10000 attributes, 3 replicates (per attribute)、典型的な 4D データセットを想像してみてください。基本的に、私は2D (row=months, col=attributes) で終わるので、この DataFrame3 replicates per attributeは私の辞書の値であり、それが由来する患者がキーです (つまり (patient_x : DataFrame_X) )meanpd.DataFrame

また、np.ndarrayプレースホルダーで行ったラウンドアバウトの方法も含めますが、キーが patient_x で値が DataFrame_X である辞書から N 次元の DataArray を生成できれば非常に便利です。

の辞書をDataArray使用して N 次元を作成するにはどうすればよいですか?XarrayPandas DataFrames

import xarray as xr
import numpy as np
import pandas as pd

np.random.seed(1618033)

#Set dimensions
a,b,c,d = 100,12,10000,3 #100 patients, 12 months, 10000 attributes, 3 replicates

#Create labels
patients = ["patient_%d" % i for i in range(a)]
months = [j for j in range(b)]
attributes = ["attr_%d" % k for k in range(c)]
replicates = [l for l in range(d)]

coords = [patients,months,attributes]
dims = ["Patients","Months","Attributes"]

#Dict of DataFrames
D_patient_DF = dict()

for i, patient in enumerate(patients):
    A_placeholder = np.zeros((b,c))
    for j, month in enumerate(months):
        #Attribute x Replicates
        A_attrReplicates = np.random.random((c,d))
        #Collapse into 1D Vector
        V_attrExp = A_attrReplicates.mean(axis=1)
        #Fill array with row
        A_placeholder[j,:] = V_attrExp
    #Assign dataframe for every patient
    DF_data = pd.DataFrame(A_placeholder, index = months, columns = attributes)
    D_patient_DF[patient] = DF_data

 xr.DataArray(D_patient_DF).dims
#() its empty

D_patient_DF
#{'patient_0':       attr_0    attr_1    attr_2    attr_3    attr_4    attr_5    attr_6  \
# 0   0.445446  0.422018  0.343454  0.140700  0.567435  0.362194  0.563799   
# 1   0.440010  0.548535  0.810903  0.482867  0.469542  0.591939  0.579344   
# 2   0.645719  0.450773  0.386939  0.418496  0.508290  0.431033  0.622270   
# 3   0.555855  0.633393  0.555197  0.556342  0.489865  0.204200  0.823043   
# 4   0.916768  0.590534  0.597989  0.592359  0.484624  0.478347  0.507789   
# 5   0.847069  0.634923  0.591008  0.249107  0.655182  0.394640  0.579700   
# 6   0.700385  0.505331  0.377745  0.651936  0.334216  0.489728  0.282544   
# 7   0.777810  0.423889  0.414316  0.389318  0.565144  0.394320  0.511034   
# 8   0.440633  0.069643  0.675037  0.365963  0.647660  0.520047  0.539253   
# 9   0.333213  0.328315  0.662203  0.594030  0.790758  0.754032  0.602375   
# 10  0.470330  0.419496  0.171292  0.677439  0.683759  0.646363  0.465788   
# 11  0.758556  0.674664  0.801860  0.612087  0.567770  0.801514  0.179939

score 5 · Accepted Answer

DataFrame のディクショナリから、各値を DataArray に変換し (ディメンションラベルを追加)、結果を Dataset にロードしてから、DataArray に変換します。

variables = {k: xr.DataArray(v, dims=['month', 'attribute'])
             for k, v in D_patient_DF.items()}
combined = xr.Dataset(variables).to_array(dim='patient')
print(combined)

ただし、結果は必ずしもソートされた順序で並べられるとは限らず、辞書の反復の任意の順序を使用することに注意してください。並べ替えが必要な場合は、代わりに OrderedDict を使用する必要があります (variables上記の設定の後に挿入)。

variables = collections.OrderedDict((k, variables[k]) for k in patients)

これは以下を出力します:

<xarray.DataArray (patient: 100, month: 12, attribute: 10000)>
array([[[ 0.61176399,  0.26172557,  0.74657302, ...,  0.43742111,
          0.47503291,  0.37263983],
        [ 0.34970732,  0.81527751,  0.53612895, ...,  0.68971198,
          0.68962168,  0.75103198],
        [ 0.71282751,  0.23143891,  0.28481889, ...,  0.52612376,
          0.56992843,  0.3483683 ],
        ...,
        [ 0.84627257,  0.5033482 ,  0.44116194, ...,  0.55020168,
          0.48151353,  0.36374339],
        [ 0.53336826,  0.59566147,  0.45269417, ...,  0.41951078,
          0.46815364,  0.44630235],
        [ 0.25720899,  0.18738289,  0.66639783, ...,  0.36149276,
          0.58865823,  0.33918553]],

       ...,

       [[ 0.42933273,  0.58642504,  0.38716496, ...,  0.45667285,
          0.72684589,  0.52335464],
        [ 0.34946576,  0.35821339,  0.33097093, ...,  0.59037927,
          0.30233665,  0.6515749 ],
        [ 0.63673498,  0.31022272,  0.65788374, ...,  0.47881873,
          0.67825066,  0.58704331],
        ...,
        [ 0.44822441,  0.502429  ,  0.50677081, ...,  0.4843405 ,
          0.84396521,  0.45460029],
        [ 0.61336348,  0.46338301,  0.60715273, ...,  0.48322379,
          0.66530209,  0.52204897],
        [ 0.47520639,  0.43490559,  0.27309414, ...,  0.35280585,
          0.30280485,  0.77537204]]])
Coordinates:
  * month      (month) int64 0 1 2 3 4 5 6 7 8 9 10 11
  * patient    (patient) <U10 'patient_80' 'patient_73' 'patient_79' ...
  * attribute  (attribute) object 'attr_0' 'attr_1' 'attr_2' 'attr_3' ...

または、2D DataArray のリストを作成してから使用することもできますconcat。

patient_list = []
for i, patient in enumerate(patients):
    df = ...
    array = xr.DataArray(df, dims=['patient', 'attribute'])
    patient_list.append(df)
combined = xr.concat(patient_list, dim=pd.Index(patients, name='patient')

これは同じ結果をもたらし、おそらく最もクリーンなコードです。

python - 2D DataFrames/Arrays の Dict から DataArray を作成

1 に答える 1

Related

Reference