python - 文字列から int への変換が遅すぎる

Question

50000 の行ごとに 3 つの文字列を読み取るプログラムがあります。その後、他のことを行います。ファイルを読み取って整数に変換する部分は、合計実行時間の 80% を占めています。

私のコードスニペットは以下の通りです:

import time
file = open ('E:/temp/edges_big.txt').readlines()
start_time = time.time()
for line in file[1:]:
    label1, label2, edge = line.strip().split()
    # label1 = int(label1); label2 = int(label2); edge = float(edge)
    # Rest of the loop deleted
print ('processing file took ', time.time() - start_time, "seconds")

上記には約0.84 秒かかります。今、行のコメントを外すと

label1 = int(label1);label2 = int(label2);edge = float(edge)

ランタイムは約3.42 秒に上昇します。

入力ファイルの形式は次のとおりですstr1 str2 str3。

関数int()などfloat()は遅いですか？どうすればこれを最適化できますか?

score 4 · Accepted Answer

ファイルが OS キャッシュにある場合、私のマシンではファイルの解析に数ミリ秒かかります。

name                                 time ratio comment
read_read                        145 usec  1.00 big.txt
read_readtxt                    2.07 msec 14.29 big.txt
read_readlines                  4.94 msec 34.11 big.txt
read_james_otigo                29.3 msec 201.88 big.txt
read_james_otigo_with_int_float 82.9 msec 571.70 big.txt
read_map_local                  93.1 msec 642.23 big.txt
read_map                        95.6 msec 659.57 big.txt
read_numpy_loadtxt               321 msec 2213.66 big.txt

関数read_*()は次のとおりです。

def read_read(filename):
    with open(filename, 'rb') as file:
        data = file.read()

def read_readtxt(filename):
    with open(filename, 'rU') as file:
        text = file.read()

def read_readlines(filename):
    with open(filename, 'rU') as file:
        lines = file.readlines()

def read_james_otigo(filename):
    file = open (filename).readlines()
    for line in file[1:]:
        label1, label2, edge = line.strip().split()

def read_james_otigo_with_int_float(filename):
    file = open (filename).readlines()
    for line in file[1:]:
        label1, label2, edge = line.strip().split()
        label1 = int(label1); label2 = int(label2); edge = float(edge)

def read_map(filename):
    with open(filename) as file:
        L = [(int(l1), int(l2), float(edge))
             for line in file
             for l1, l2, edge in [line.split()] if line.strip()]

def read_map_local(filename, _i=int, _f=float):
    with open(filename) as file:
        L = [(_i(l1), _i(l2), _f(edge))
             for line in file
             for l1, l2, edge in [line.split()] if line.strip()]

import numpy as np

def read_numpy_loadtxt(filename):
    a = np.loadtxt('big.txt', dtype=[('label1', 'i'),
                                     ('label2', 'i'),
                                     ('edge', 'f')])

以下big.txtを使用して生成されます。

#!/usr/bin/env python
import numpy as np

n = 50000
a = np.random.random_integers(low=0, high=1<<10, size=2*n).reshape(-1, 2)
np.savetxt('big.txt', np.c_[a, np.random.rand(n)], fmt='%i %i %s')

50000 行が生成されます。

150 952 0.355243621018
582 98 0.227592557278
478 409 0.546382780254
46 879 0.177980983303
...

結果を再現するには、コードをダウンロードして実行します。

# write big.txt
python generate-file.py
# run benchmark
python read-array.py

score 3 · Accepted Answer

私はあなたとほぼ同じタイミングを得ることができます。問題は、タイミングを行っていた私のコードにあったと思います:

read_james_otigo                  40 msec big.txt
read_james_otigo_with_int_float  116 msec big.txt
read_map                         134 msec big.txt
read_map_local                   131 msec big.txt
read_numpy_loadtxt               400 msec big.txt
read_read                        488 usec big.txt
read_readlines                  9.24 msec big.txt
read_readtxt                    4.36 msec big.txt

name                                 time ratio comment
read_read                        488 usec  1.00 big.txt
read_readtxt                    4.36 msec  8.95 big.txt
read_readlines                  9.24 msec 18.95 big.txt
read_james_otigo                  40 msec 82.13 big.txt
read_james_otigo_with_int_float  116 msec 238.64 big.txt
read_map_local                   131 msec 268.05 big.txt
read_map                         134 msec 274.87 big.txt
read_numpy_loadtxt               400 msec 819.42 big.txt


read_james_otigo                39.4 msec big.txt
read_readtxt                    4.37 msec big.txt
read_readlines                  9.21 msec big.txt
read_map_local                   131 msec big.txt
read_james_otigo_with_int_float  116 msec big.txt
read_map                         134 msec big.txt
read_read                        487 usec big.txt
read_numpy_loadtxt               398 msec big.txt

name                                 time ratio comment
read_read                        487 usec  1.00 big.txt
read_readtxt                    4.37 msec  8.96 big.txt
read_readlines                  9.21 msec 18.90 big.txt
read_james_otigo                39.4 msec 80.81 big.txt
read_james_otigo_with_int_float  116 msec 238.51 big.txt
read_map_local                   131 msec 268.84 big.txt
read_map                         134 msec 275.11 big.txt
read_numpy_loadtxt               398 msec 816.71 big.txt

score 1 · Accepted Answer

私はこれをまったく再現できません。

スペースで区切られた各行に 3 つの乱数 (2 つの int、1 つの float) を含む 50000 行のファイルを生成しました。

次に、そのファイルでスクリプトを実行しました。元のスクリプトは 3 年前の PC で 0.05 秒で終了しますが、コメントを外した行のスクリプトは 0.15 秒で終了します。もちろん、文字列から int/float への変換には時間がかかりますが、確かに数秒の規模ではありません。ターゲットマシンが組み込みの Windows CE を実行するトースターでない限り。

python - 文字列から int への変換が遅すぎる

3 に答える 3

Related

Reference