python - Python で cProfiler の深さを増やして、より多くの関数を報告しますか?

Question

他の関数を呼び出す関数をプロファイリングしようとしています。次のようにプロファイラーを呼び出します。

from mymodule import foo
def start():
   # ...
   foo()

import cProfile as profile
profile.run('start()', output_file)
p = pstats.Stats(output_file)
print "name: "
print p.sort_stats('name')
print "all stats: "
p.print_stats()
print "cumulative (top 10): "
p.sort_stats('cumulative').print_stats(10)

プロファイラーは、サブ関数 foo() 呼び出しに分割するのではなく、mymodule の関数 "foo()" ですべての時間が費やされたと言っていることがわかりました。これは、私が見たいものです。これらの関数のパフォーマンスをプロファイラーに報告させるにはどうすればよいですか?

ありがとう。

score 0 · Accepted Answer

まず、アスカーの問題を再現できませんでした。プロファイラー (py2.7) は、呼び出された関数とメソッドに確実に分類されます。(py3.6 のドキュメントは同じように見えますが、私は py3 でテストしていません。) 私の推測では、累積時間でソートされた上位 10 個のリターンに制限することで、それらの最初の N 個は、呼び出される非常に高レベルの関数でした。最小限の時間で、呼び出された関数foo()はリストの一番下に落ちました。

テストのためにいくつかの大きな数字で遊ぶことにしました。ここに私のテストコードがあります:

# file: mymodule.py
import math

def foo(n = 5):
    for i in xrange(1,n):
        baz(i)
        bar(i ** i)

def bar(n):
    for i in xrange(1,n):
        e  = exp200(i)
        print "len e: ", len("{}".format(e))

def exp200(n):
    result = 1
    for i in xrange(200):
        result *= n
    return result

def baz(n):
    print "{}".format(n)

そしてインクルードファイル（アスカーのものと非常に似ています）：

# file: test.py

from mymodule import foo

def start():
   # ...
   foo(8)

OUTPUT_FILE = 'test.profile_info'

import pstats
import cProfile as profile

profile.run('start()', OUTPUT_FILE)
p = pstats.Stats(OUTPUT_FILE)
print "name: "
print p.sort_stats('name')
print "all stats: "
p.print_stats()
print "cumulative (top 10): "
p.sort_stats('cumulative').print_stats(10)
print "time (top 10): "
p.sort_stats('time').print_stats(10)

最後の行に注目してください。でソートされたビューを追加しました。これは、「サブ関数の呼び出しで行われた時間を除くtime」関数で費やされた合計時間です。このビューは、実際の作業を行っている関数を優先する傾向があり、最適化が必要な場合があるため、はるかに便利です。

アスカーが作業していた結果の一部 ( cumulative-sorted) は次のとおりです。

cumulative (top 10):
Thu Mar 24 21:26:32 2016    test.profile_info

         2620840 function calls in 76.039 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   76.039   76.039 <string>:1(<module>)
        1    0.000    0.000   76.039   76.039 test.py:5(start)
        1    0.000    0.000   76.039   76.039 /Users/jhazen/mymodule.py:4(foo)
        7   10.784    1.541   76.039   10.863 /Users/jhazen/mymodule.py:10(bar)
   873605   49.503    0.000   49.503    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873612   15.634    0.000   15.634    0.000 {method 'format' of 'str' objects}
   873605    0.118    0.000    0.118    0.000 {len}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:21(baz)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

この表示の上位 3 つの関数が 1 回だけ呼び出されたことを確認してください。timeソートされたビューを見てみましょう:

time (top 10):
Thu Mar 24 21:26:32 2016    test.profile_info

         2620840 function calls in 76.039 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   873605   49.503    0.000   49.503    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873612   15.634    0.000   15.634    0.000 {method 'format' of 'str' objects}
        7   10.784    1.541   76.039   10.863 /Users/jhazen/mymodule.py:10(bar)
   873605    0.118    0.000    0.118    0.000 {len}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:21(baz)
        1    0.000    0.000   76.039   76.039 /Users/jhazen/mymodule.py:4(foo)
        1    0.000    0.000   76.039   76.039 test.py:5(start)
        1    0.000    0.000   76.039   76.039 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

これで、最初のエントリは理にかなっています。乗算を繰り返して何かを 200 乗することは、明らかに「素朴な」戦略です。それを置き換えましょう：

def exp200(n):
    return n ** 200

そして結果：

time (top 10):
Thu Mar 24 21:32:18 2016    test.profile_info

         2620840 function calls in 30.646 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   873612   15.722    0.000   15.722    0.000 {method 'format' of 'str' objects}
        7    9.760    1.394   30.646    4.378 /Users/jhazen/mymodule.py:10(bar)
   873605    5.056    0.000    5.056    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873605    0.108    0.000    0.108    0.000 {len}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:18(baz)
        1    0.000    0.000   30.646   30.646 /Users/jhazen/mymodule.py:4(foo)
        1    0.000    0.000   30.646   30.646 test.py:5(start)
        1    0.000    0.000   30.646   30.646 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

それは時間の素晴らしい改善です。今str.format()は私たちの最悪の犯罪者です。数値の長さを出力する行を追加しましたbar()。これは、最初の試行 (数値を計算して何もしない) が最適化され、それを回避しようとしたためです (数値を出力すると、非常に速くなりました)。 I/O でブロックされている可能性があるように見えたので、数値の長さを出力することに妥協しました。ねえ、それは base-10 のログです。それを試してみましょう：

def bar(n):
    for i in xrange(1,n):
        e  = exp200(i)
        print "log e: ", math.log10(e)

そして結果：

time (top 10):
Thu Mar 24 21:40:16 2016    test.profile_info

         1747235 function calls in 11.279 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        7    6.082    0.869   11.279    1.611 /Users/jhazen/mymodule.py:10(bar)
   873605    4.996    0.000    4.996    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873605    0.201    0.000    0.201    0.000 {math.log10}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:18(baz)
        1    0.000    0.000   11.279   11.279 /Users/jhazen/mymodule.py:4(foo)
        7    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        1    0.000    0.000   11.279   11.279 test.py:5(start)
        1    0.000    0.000   11.279   11.279 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

うーん、 .bar()がなくても、str.format(). そのプリントを取り除きましょう：

def bar(n):
    z = 0
    for i in xrange(1,n):
        e  = exp200(i)
        z += math.log10(e)
    return z

そして結果：

time (top 10):
Thu Mar 24 21:45:24 2016    test.profile_info

         1747235 function calls in 5.031 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   873605    4.487    0.000    4.487    0.000 /Users/jhazen/mymodule.py:17(exp200)
        7    0.440    0.063    5.031    0.719 /Users/jhazen/mymodule.py:10(bar)
   873605    0.104    0.000    0.104    0.000 {math.log10}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:20(baz)
        1    0.000    0.000    5.031    5.031 /Users/jhazen/mymodule.py:4(foo)
        7    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        1    0.000    0.000    5.031    5.031 test.py:5(start)
        1    0.000    0.000    5.031    5.031 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

これで、実際の作業を行っているものが最も忙しい関数のように見えるので、最適化は完了したと思います。

それが役立つことを願っています!

score 0 · Accepted Answer

p.print_callees()メソッド呼び出しの階層的な内訳を取得する必要があります。出力は一目瞭然です。左側の列には、目的の関数が表示されます。たとえばfoo()、右側の列には、呼び出されたすべてのサブ関数と、それらの範囲内の合計時間と累積時間が表示されます。これらのサブコールの内訳も含まれています。

score -1 · Accepted Answer

同様の問題に直面した可能性があるため、ここで私の問題について説明します。私のプロファイリングコードは次のようになります。

def foobar():
    import cProfile
    pr = cProfile.Profile()
    pr.enable()
    for event in reader.events():
        baz()
        # and other things

    pr.disable()
    pr.dump_stats('result.prof')

そして、最終的なプロファイリング出力にはevents()call のみが含まれていました。そして、空のループのプロファイリングがあることに気付くのに、それほど時間はかかりませんでした。foobar()もちろん、クライアントコードからの呼び出しは複数回ありましたが、意味のあるプロファイリング結果は、空のループを使用した最後の 1 つの呼び出しによって上書きされていました。

python - Python で cProfiler の深さを増やして、より多くの関数を報告しますか?

3 に答える 3

Related

Reference