python - リスト内包表記の最適化

Question

8 行のコードをわずか 2 行に変換することができました。

最初のリスト内包表記はフォルダーを取得し、2 番目のリスト内包表記は特定のフィルターのファイルを取得します。

hideTheseFolders=[".thumb",".mayaSwatches","RECYCLER","$AVG"]
fileFilters=["ma","jpg","png","mb",'iff','tga','tif']
newLst=[]
import os
locationTxt="E:\box\scripts"
[newLst.append(each) for each in os.listdir(locationTxt)  if os.path.isdir(os.path.join(locationTxt,each)) and each not in hideTheseFolders]
[newLst.append(os.path.basename(os.path.join(locationTxt,each))) for nfile in fileFilters for each in os.listdir(locationTxt) if each.endswith(nfile)]

上記のコードでは、最後の 2 行がから同じディレクトリ内を参照していlocationTxtます。つまり、最後の 2 行をマージできる方法がある可能性があります。助言がありますか？

score 4 · Accepted Answer

リスト内包表記は最適化の手法ではありません。Python コンパイラがリスト内包表記を検出すると、それを for ループに分割します。バイトコード 13 ( FOR_ITER)を見てください。

In [1]: from dis import dis

In [2]: code = "[i for i in xrange(100)]"

In [3]: dis(compile(code, '', 'single'))
  1           0 BUILD_LIST               0
              3 LOAD_NAME                0 (xrange)
              6 LOAD_CONST               0 (100)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                12 (to 28)
             16 STORE_NAME               1 (i)
             19 LOAD_NAME                1 (i)
             22 LIST_APPEND              2
             25 JUMP_ABSOLUTE           13
        >>   28 POP_TOP             
             29 LOAD_CONST               1 (None)
             32 RETURN_VALUE

リスト内包表記がfor ループと同じであることは、タイミングを見ればわかります。この場合、for ループは実際にはわずかに (しかしわずかに) 高速に動作します。

In [4]: %timeit l = [i for i in xrange(100)]
100000 loops, best of 3: 13.6 us per loop

In [5]: %%timeit l = []; app = l.append  # optimise out the attribute lookup for a fairer test
   ...: for i in xrange(100):
   ...:     app(i)
   ...: 
100000 loops, best of 3: 11.9 us per loop  #  insignificant difference. Run it yourself and you might get it the other way around

したがって、任意のリスト内包表記を for ループとして記述して、パフォーマンスへの影響を最小限に抑え (実際には、通常、属性ルックアップによるわずかな違いがあります)、多くの場合、読みやすさが大幅に向上します。特に、副作用のあるループはリスト内包表記として記述すべきではありません。また、約 2 つ以上のforキーワードを持つリスト内包表記や、1 行が 70 文字程度を超えるようなリスト内包表記も使用しないでください。これらは厳格な規則ではなく、読みやすいコードを作成するためのヒューリスティックにすぎません。

誤解しないでください。リスト内包表記は非常に便利で、多くの場合、同等の for-loop-and-append よりも明確で単純で簡潔です。しかし、彼らはこのように虐待されるべきではありません。

score 4 · Accepted Answer

まず、リスト内包表記を悪用して、ループ内に追加してループを隠しています。あなたは実際にリスト内包表記の結果を捨てています。第二に、可読性を犠牲にして 1 行にできるだけ多くを詰め込む必要はありません。

リスト内包表記を使用したい場合は、ループとフィルタリングによってリストを作成するときに実際にはかなり良いアイデアです。このバージョンを検討してください。

ignore_dirs = set([".thumb",".mayaSwatches","RECYCLER","$AVG"])
extensions = ["ma", "jpg", "png", "mb", 'iff', 'tga', 'tif']
location = "E:\\box\\scripts"

filelist = [fname for fname in os.listdir(location)
                  if fname not in ignore_dirs
                  if os.path.isdir(os.path.join(location, fname))]
filelist += [os.path.basename(fname)
             for fname in os.listdir(location)
             if any(fname.endswith(ext) for ext in extensions)]

論理的に 2 種類の項目で構成されるリストを作成しているように見えるため、まだ 2 つの内包表記があることに注意してください。ステートメント+の代わりにを挟んで2 つの内包表記を使用することもできますが、単一の式でそれを試みる必要はありません。+=

(変数が何を表しているかを反映するために、変数の名前を自由に変更しました。)

score 1 · Accepted Answer

主な提案は、まともな Python の本を入手してよく読むことです。コードから判断すると、リスト内包表記がどのように機能するかわかりませんが、それでも、読み取り可能な 8 行のコードを、長すぎて理解できない 2 行に詰め込むことができました。

読みやすいプログラムを作成する必要があります。

改行はあなたの友達です、それらを使用してください
スペースもあなたの友達です
行は画面に収まる必要があります (<50 文字)
importsファイルの先頭に入れる
パイソンの本を読む

ご参考までに、コードは次のようになります。

import os

path = 'e:/box/scripts'

newLst = list()
for root,dirs,files in os.walk(path) :
    # add folders
    newLst.extend( [dir for dir in dirs if dir not in hideTheseFolders] )

    # add files
    newLst.extend( [file for file in files if file.lower().endswith(fileFilters)] )

    break    # don't descend into subfolders

# convert to the full path or whatever you need here
newLst = [os.path.join(path, file) for file in newLst]

score 0 · Accepted Answer

より読みやすく、リストの理解を避けるか、リストの理解を行う必要がある場合は、読み取り可能なコードへのバックアップ参照を保持するコードに固執します。

これまでのところ、リストの理解を行うための私の学習は、誰もが従うことができるようにそれを置きます.

理解の主な用途は次のとおりです。

イテレータの結果を（おそらくフィルターを使用して）永続的なリストに取得します。files = [f for f in list_files() if f.endswth("mb")]
反復可能な型間の変換:example = "abcde"; letters = [x for x in example] # this is handy for data packed into strings!
単純なリスト処理:strings = [str(x) for x in list_of_numbers]
読みやすくするためのラムダによるより複雑なリスト処理: filter_func = lambda p, q: p > q larger_than_last = [val for val in list_of_numbers if filter_func(val, 5)]

皆さん、ご意見をお寄せいただきありがとうございます。

更新情報: 私の調査とトラブルシューティングにより、正確な答えが得られました。

filters = [[".thumb", ".mayaSwatches", "RECYCLER", "$AVG"], ["ma", "jpg", "png", "mb", 'iff', 'tga', 'tif']]
locationTxt = r"E:\box\scripts"
newLst = [each for each in os.listdir(locationTxt) if os.path.isdir(os.path.join(locationTxt, each)) and each not in filters[0]] + [each for each in os.listdir(locationTxt) if os.path.isfile(os.path.join(locationTxt, each)) and os.path.splitext(each)[-1][1:] in filters[1]]

ただし、私が述べたように、読み取り可能なコードロジックに固執することが道です!!!

python - リスト内包表記の最適化

4 に答える 4

Related

Reference