python - Python、インデックス、およびPandasを使用したデータフレーム

Question

Pythonは初めてで、インデックスとデータフレームについて質問があります。3つのキー（district_code、district_name、district_type）で一意に識別できる3つのソースファイルがあります。ソースファイルとデータベースにあるものの間でデータを検証しています。以下のコードは、ソースファイルとデータベースの間で一致するdist_nameとdist_codeのみを検索し、使用可能な値に応じて一方を他方と一致させます。関数に別の条件を追加して、distを一意に一致させ、3つのキーすべてを比較するインデックス/キー（district_type）を追加する必要があります。

*編集** これを単純化するために、アプローチのロジックを変更しました。2つのキーdistrict_codeとdistrict_typeを連結することにより、一意の識別子（dist_key）が得られます。この変更を反映するために以下の関数を変更しましたが、「keyError：u'noitemdist_key」が表示されます。新しい一意の識別子はこの関数でのみ定義されていると思うので、これはエラーを生成している関数です。この言語とスクリプトの初心者であるため、関数の外部で必要な変数（dist_key）を呼び出す方法がわかりません。

if dfyearfound:
    df2['district_name']=df2['district_name'].str.strip()
    df2['district_code']=df2['district_code'].str.strip()
    **df2['dist_key']=df2['dist_key'].str.strip()**  """This line is causing the error"""

def addNamesCodes(testframe,districtnamedata,districtcodedata):
        """ Function that will correct any missing data such as district names or district codes.  Parameter is a pandas dataframe, dictionaries which map the district names and district codes """


        #contain list of correct district codes and district names
        districtnames=[]
        districtkeys=[]
        #Non matches
        fdistrictnames=[]
        fdistrictkeys=[]
        #fill empty values in names and codes

        testframe['district_name']=testframe['district_name'].apply(lambda x: str(x))
        testframe['district_name']=testframe['district_name'].fillna('')
        testframe['district_code']=testframe['district_code'].fillna('')
        testframe['dist_key']=testframe['dist_key'].fillna('')
        testframe['dist_key']=testframe['dist_key']+testframe['district_code']
        #Create two new columns containing the district names and district codes in same format as enrollment and teacher data  
        for i in range(len(testframe.index)):
            #both district code and district name are present
            if districtnamedata.has_key(testframe['dist_key'][testframe.index[i]]) and districtcodedata.has_key(testframe['district_name'][testframe.index[i]]):
                #district code and district name are a match
                if ((districtnamedata[testframe['dist_key'][testframe.index[i]]]==testframe['district_name'][testframe.index[i]]) and (districtcodedata[testframe['district_name'][testframe.index[i]]]==testframe['dist_key'][testframe.index[i]])):
                    districtnames.append(districtnamedata[testframe['dist_key'][testframe.index[i]]])
                    districtkeys.append(districtcodedata[testframe['district_name'][testframe.index[i]]])
                #potential wrong mappings
                else:
                    districtkeys.append(testframe['dist_key'][testframe.index[i]])
                    districtnames.append(districtnamedata[testframe['dist_key'][testframe.index[i]]])
            else:
                #check if district code is present
                if districtnamedata.has_key(testframe['dist_key'][testframe.index[i]]):
                    districtkeys.append(testframe['dist_key'][testframe.index[i]])
                    districtnames.append(districtnamedata[testframe['dist_key'][testframe.index[i]]])
                #check if only district name is present 
                elif districtcodedata.has_key(testframe['district_name'][testframe.index[i]]):
                    districtnames.append(testframe['district_name'][testframe.index[i]])
                    districtkeys.append(districtcodedata[testframe['district_name'][testframe.index[i]]])
                #complete nonmatches
                else:
                    fdistrictnames.append(testframe['district_name'][testframe.index[i]])
                    fdistrictkeys.append(testframe['dist_key'][testframe.index[i]])
        #extend the list by the complete nonmatches
        districtnames.extend(fdistrictnames)
        districtkeys.extend(fdistrictkeys)

        return districtnames,districtkeys

サンプルソース：登録---

district_code   district_name   district_type_code  enroll_totals
1                 AITKIN               1                    122
1                 AITKIN               1                    123
1                 SAVAGE               3                    140
1                 SAVAGE               3                    780
15              ST. FRANCIS            1                    782
16              SPRING LAKE            1                    784

ファイナンス - -

district_code   district_name   district_type_code          budget
1                 AITKIN               1                    122000
1                 AITKIN               1                    120003
1                 SAVAGE               3                    140000
1                 SAVAGE               3                    780000
15              ST. FRANCIS            1                    782000
16              SPRING LAKE            1                    784000

先生 - -

district_code   district_name   district_type_code         Salary
1                 AITKIN               1                    50000
1                 AITKIN               1                    42000
1                 SAVAGE               3                    89000
1                 SAVAGE               3                    32000
15              ST. FRANCIS            1                    78000
16              SPRING LAKE            1                    58000

python - Python、インデックス、およびPandasを使用したデータフレーム

0 に答える 0

Related

Reference