python - この Python スクリプトを高速化する必要があります - StringIO は非常に遅いと思います

Question

これは機能していますが、遅いです。

.gpx を .kml に変換するカスタム .py を作成しました。必要なように動作しますが、非常に遅いです。477k の小さな .gpx の場合、完了するのに 198 秒かかる 207k の .kml ファイルを書き込んでいます! それはばかげていると私は肉の.gpxサイズにまだ到達していません.

私の推測では、stringIO.stringIO(x)とても遅いのはです。それをスピードアップする方法はありますか？

ありがとうございます。

重要な抜粋のみを次に示します。

f = open(fileName, "r")
x = f.read()
x = re.sub(r'\n', '', x, re.S) #remove any newline returns
name = re.search('<name>(.*)</name>', x, re.S)
print "Attachment name (as recorded from GPS device): " + name.group(1)

x = re.sub(r'<(.*)<trkseg>', '', x, re.S)  #strip header
x = x.replace("</trkseg></trk></gpx>",""); #strip footer
x = x.replace("<trkpt","\n<trkpt"); #make the file in lines
x = re.sub(r'<speed>(.*?)</speed>', '', x, re.S) #strip speed
x = re.sub(r'<extensions>(.*?)</extensions>', '', x, re.S) # strip out extensions

それから

#.kml header goes here
kmlTrack = """<?xml version="1.0" encoding="UTF-8"?><kml xmlns="http://www.ope......etc etc

それから

buf = StringIO.StringIO(x)
for line in buf:
            if line is not None:
                    timm = re.search('time>(.*?)</time', line, re.S)
                    if timm is not None:
                            kmlTrack += ("          <when>"+ timm.group(1)+"</when>\n")
                            checkSumA =+ 1
buf = StringIO.StringIO(x)
for line in buf:
            if line is not None:
                    lat = re.search('lat="(.*?)" lo', line, re.S)
                    lon = re.search('lon="(.*?)"><ele>', line, re.S)
                    ele = re.search('<ele>(.*?)</ele>', line, re.S)
                    if lat is not None:
                            kmlTrack += ("          <gx:coord>"+ lon.group(1) + " " + lat.group(1) + " " + ele.group(1) + "</gx:coord>\n")
                            checkSumB =+ 1
if checkSumA == checkSumB:
            #put a footer on
            kmlTrack += """     </gx:Track></Placemark></Document></kml>"""
else:
            print ("checksum error")
            return None

with open("outFile.kml", "a") as myfile:
            myfile.write(kmlTrack)
return ("succsesful .kml file-write completed in :" + str(c.seconds) + " seconds.")

繰り返しますが、これは機能していますが、非常に遅いです。これを高速化する方法を誰かが知っている場合は、私に知らせてください! 乾杯

更新しました

提案をありがとう、すべて。私は Python を初めて使用し、プロファイリングについて聞いて感謝しています。それについて知りました。それを私のスクリプトに追加しました。合計実行時間 209 秒のうち 208 秒が 1 行で発生します。ここにスニップがあります：

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 ....

 4052    0.013    0.000    0.021    0.000 StringIO.py:139(readline)
 8104    0.004    0.000    0.004    0.000 StringIO.py:38(_complain_ifclosed)
    2    0.000    0.000    0.000    0.000 StringIO.py:54(__init__)
    2    0.000    0.000    0.000    0.000 StringIO.py:65(__iter__)
 4052    0.010    0.000    0.033    0.000 StringIO.py:68(next)
 8101    0.018    0.000    0.078    0.000 re.py:139(search)
    4    0.000    0.000  208.656   52.164 re.py:144(sub)
 8105    0.016    0.000    0.025    0.000 re.py:226(_compile)
   35    0.000    0.000    0.000    0.000 rpc.py:149(debug)
    5    0.000    0.000    0.010    0.002 rpc.py:208(remotecall)
 ......

呼び出しごとに 52 秒の 4 つの呼び出しがあります。cProfile は 144 行目で発生すると言っていますが、私のスクリプトは 94 行しかありません。どうすればこれに進むことができますか? どうもありがとう。

score 3 · Accepted Answer

OK thanks to all. the cProfile showed it was a re.sub call, though i initially wasn't sure which one - though with some trial and error, it didnt take long to isolate it. The solution was to fix the re.sub from being a 'greedy' to a 'non-greedy' call.

So the old header strip call was x = re.sub(r'<(.*)<trkseg>', '', x, re.S) #strip header now becomes x = re.sub(r'<?xml(.*?)<trkseg>', '', x, re.S) #strip header REALLY fast.

It now finshes even heavy .gxp conversions in zero seconds. What a difference a ? makes !

python - この Python スクリプトを高速化する必要があります - StringIO は非常に遅いと思います

1 に答える 1

Related

Reference