dataframe - NetCDF ファイルを Pandas DataFrame にインポートする際にスタックする

Question

私はしばらく初心者としてこれに取り組んできました。全体として、NetCDF ファイルを読み込み、複数 (~50) 列 (および 17520 ケース) を Pandas DataFrame にインポートしたいと考えています。現時点では、4 つの変数のリストを設定していますが、それを何とか拡張できるようにしたいと考えています。私は始めましたが、50個の変数でこれを実現するためにループする方法についてのヘルプは素晴らしいでしょう. 4 つの変数に対して以下のコードを使用すると機能します。私はそれがきれいではないことを知っています - まだ学んでいます!

もう 1 つの質問は、numpy配列を Pandas DataFrame に直接読み込もうとすると機能せず、代わりに 17520 列の大きさの DataFrame が作成されるということです。逆にする必要があります（転置）。シリーズを作成すると、正常に動作します。したがって、これを回避するには、次の行を使用する必要がありました。なぜそれが機能するのかさえわかりません。より良い方法の提案はありますか (特に 50 個の変数に関して)?

d={vnames[0] :vartemp[0], vnames[1] :vartemp[1], vnames[2] :vartemp[2], vnames[3] :vartemp[3]}
hs = pd.DataFrame(d,index=times)

コード全体を以下に貼り付けます。

import pandas as pd
import datetime as dt
import xlrd
import numpy as np
import netCDF4


def excel_to_pydate(exceldate):
    datemode=0           # datemode: 0 for 1900-based, 1 for 1904-based
    pyear, pmonth, pday, phour, pminute, psecond = xlrd.xldate_as_tuple(exceldate, datemode)
    py_date = dt.datetime(pyear, pmonth, pday, phour, pminute, psecond)
    return(py_date)

def main():
    filename='HowardSprings_2010_L4.nc'
#Define a list of variables names we want from the netcdf file
    vnames = ['xlDateTime', 'Fa', 'Fh' ,'Fg']

# Open the NetCDF file
    nc = netCDF4.Dataset(filename) 

#Create some lists of size equal to length of vnames list.
    temp=list(xrange(len(vnames)))
    vartemp=list(xrange(len(vnames)))

#Enumerate the list and assign each NetCDF variable to an element in the lists.  
# First get the netcdf variable object assign to temp
# Then strip the data  from that and add to temporary variable (vartemp)
    for index, variable in enumerate(vnames):               
        temp[index]= nc.variables[variable]
        vartemp[index] = temp[index][:]   

# Now call the function to convert to datetime from excel. Assume datemode: 0
    times = [excel_to_pydate(elem) for elem in vartemp[0]]

#Dont know why I cant just pass a list of variables i.e. [vartemp[0], vartemp[1], vartemp[2]]
#But this is only thing that worked
#Create Pandas dataframe using times as index
    d={vnames[0] :vartemp[0], vnames[1] :vartemp[1], vnames[2] :vartemp[2], vnames[3] :vartemp[3]}
    theDataFrame = pd.DataFrame(d,index=times)

#Define missing data value and apply to DataFrame
    missing=-9999
    theDataFrame1=theDataFrame.replace({vnames[0] :missing, vnames[1] :missing, vnames[2] :missing, vnames[3] :missing},'NaN')

main()

score 1 · Accepted Answer

置き換えることができます：

d = {vnames[0] :vartemp[0], ..., vnames[3]: vartemp[3]}
hs = pd.DataFrame(d, index=times)

と

hs = pd.DataFrame(vartemp[0:4], columns=vnames[0:4], index=times)

。

そうは言っても、パンダはHDF5を直接読み取ることができるので、おそらく同じことがnetCDF（HDF5に基づく）にも当てはまります...

dataframe - NetCDF ファイルを Pandas DataFrame にインポートする際にスタックする

1 に答える 1

Related

Reference