python - Python がデータに苦労していませんか?

Question

このプログラムの基本は、郵便番号 (郵便番号の英国版) を座標に変換することです。そこで、大量の郵便番号 (および住宅価格などのその他の添付データ) を含むファイルと、すべての英国の郵便番号とそれらの相関座標を含む別のファイルがあります。

これらの両方をリストに変換し、for ループ内で for ループを使用して、いずれかのファイルの郵便番号を反復処理して比較します。file1 の郵便番号 == file2 の郵便番号の場合、座標が取得され、関連するファイルに追加されます。

私も自分のコードを必要に応じて実行しています。私のすべてのテストは、私が望むものを正確に出力します。

唯一の問題は、データの小さなバッチでしか機能しないことです (100 行以下の .csv ファイルでテストして、100 個の内部リストのリストを作成しました)。

ここで、プログラムをデータセット全体に適用したいと考えています。一度実行しましたが、何も起こりませんでした。私は家を出てテレビを見ましたが、それでも何も起こりませんでした。IDLE では、プログラムなどを終了できませんでした。そこで、再起動して再試行しました。今回はカウンターを追加して、コードが実行されているかどうかを確認しました。コードを実行すると、カウンターが動き始めます。データセットのサイズである 78902 に達するまで。その後、停止し、何もしません。何もできないし、窓を閉めることもできない。

厄介なことに、CSV ファイルを読み取ることすらできないため、データをまったく操作できませんでした。

スタックするコードは次のとおりです (コードの最初の部分)。

    #empty variable to put the list into    
    lst = []
    # List function enables use for all files
    def create_list():

        #find the file
        file2 = input('enter filepath:')
        #read the file and iterate over it to append into the list
        with open(file2, 'r') as f:
            reader = csv.reader(f, delimiter=',')
            for row in reader:
                lst.append(row)
        return lst

では、データをより管理しやすくする方法を知っている人はいますか?

編集:ここに興味がある人のために、私の完全なコードがあります:

from tkinter.filedialog import asksaveasfile
import csv

new_file = asksaveasfile()

lst = []
# List function enables use for all files
def create_list():
    #empty variable to put the list into
    #find the file
    file2 = input('enter filepath:')
    #read the file and iterate over it to append into the list
    with open(file2, 'r') as f:
        reader = csv.reader(f, delimiter=',')
        for row in reader:
            lst.append(row)
    return lst


def remove_space(lst):
    '''(lst)->lst
    Returns the postcode value without any whitespace

    >>> ac45 6nh
    ac456nh
    The above would occur inside a list inside a list
    '''
    filetype = input('Is this a sale or crime?: ')
    num = 0
    #check the filetype to find the position of the postcodes
    if filetype == 'sale':
        num = 3
        #iterate over the postcode to add all characters but the space
    for line in range(len(lst)):        
        pc = ''
        for char in lst[line][num]:
            if char != ' ':
                pc = pc+char
        lst[line][num] = pc

def write_new_file(lst, new_file):
    '''(lst)->.CSV file
    Takes a list and writes it into a .CSV file.
    '''
    writer = csv.writer(new_file, delimiter=',')
    writer.writerows(lst)
    new_file.close()


#conversion function
def find_coord(postcode):

    lst = create_list()
    #create python list for conversion comparison
    print(lst[0])
    #empty variables
    long = 0
    lat = 0
    #iterate over the list of postcodes, when the right postcode is found,
    # return the co-ordinates.
    for row in lst:
        if row[1] == postcode:
            long = row[2]
            lat = row[3]
    return str(long)+' '+str(lat)

def find_all_coord(postcode, file):

    #empty variables
    long = 0
    lat = 0
    #iterate over the list of postcodes, when the right postcode is found,
    # return the co-ordinates.
    for row in file:
        if row[1] == postcode:
            long = row[2]
            lat = row[3]
    return str(long)+' '+str(lat)

def convert_postcodes():
    '''
    take a list of lst = []
    #find the file
    file2 = input('enter filepath:')
    #read the file and iterate over it to append into the list
    with open(file2, 'r') as f:
        reader = csv.reader(f, delimiter=',')
        for row in reader:
            lst.append(row)
    '''
    #save the files into lists so that they can be used
    postcodes = []
    with open(input('enter postcode key filepath:'), 'r') as f:
        reader = csv.reader(f, delimiter=',')
        for row in reader:
            postcodes.append(row)
    print('enter filepath to be converted:')
    file = []
    with open(input('enter filepath to be converted:'), 'r') as f:
        reader = csv.reader(f, delimiter=',')
        for row in reader:
            file.append(row)
    #here is the conversion code
    long = 0
    lat = 0
    matches = 0
    for row in range(len(file)):
        for line in range(len(postcodes)):
            if file[row][3] == postcodes[line][1]:
                long = postcodes[line][2]
                lat = postcodes[line][3]
                file[row].append(str(long)+','+str(lat))
                matches = matches+1
                print(matches)
    final_file = asksaveasfile()
    write_new_file(file, final_file)

IDLE から関数を個別に呼び出して、プログラムに関数自体を実行させる前にテストできるようにします。

score 3 · Accepted Answer

あなたの問題は、すべてのファイルのすべてのコードを調べて、膨大な数の比較を行うことです。

郵便番号をキーにして、それをdictに保存しようとすることができます。

score 2 · Accepted Answer

dict()の代わりにを使用すると、コードがはるかに効率的になりますlist()。一般的なアルゴリズム:

データを 2 つの辞書に読み込みます。1 つは座標用、もう 1 つは情報用です。どちらも郵便番号をキーにしています。
これらの辞書の中で最も短いものを繰り返し処理し、郵便番号ごとに大きな辞書で同じ郵便番号を持つアイテムを見つけます。一致した郵便番号を保存して、どこかに座標を設定します。

問題は、インデックスによる取得にdict()は O(1)時間の複雑さがあるのに対しlist()、検索による O(n) は O(n) です (これは別のループを実行するのとほぼ同じです)。大きなデータの場合、これは大きな違いになります。実際、二重ループは必要ありません。

python - Python がデータに苦労していませんか?

5 に答える 5

Related

Reference