python - 大文字と小文字を区別するパスを取得する Pythonic の方法は?

Question

Pythonで大文字と小文字を区別するパスを返す関数を実装するより高速な方法があるかどうか疑問に思っていました. 私が思いついた解決策の 1 つは、Linux と Windows の両方で動作しますが、os.listdir を繰り返す必要があり、遅くなる可能性があります。

このソリューションは、十分な速度を必要としないアプリケーションとコンテキストでうまく機能します。

def correctPath(start, path):
    'Returns a unix-type case-sensitive path, works in windows and linux'
    start = unicode(start);
    path = unicode(path);
    b = '';
    if path[-1] == '/':
        path = path[:-1];
    parts = path.split('\\');
    d = start;
    c = 0;
    for p in parts:
        listing = os.listdir(d);
        _ = None;
        for l in listing:
            if p.lower() == l.lower():
                if p != l:
                    c += 1;
                d = os.path.join(d, l);
                _ = os.path.join(b, l);
                break;
        if not _:
            return None;
        b = _;

    return b, c; #(corrected path, number of corrections)

>>> correctPath('C:\\Windows', 'SYSTEM32\\CmD.EXe')
(u'System32\\cmd.exe', 2)

ただし、コンテキストが大規模な 50,000 エントリ以上のデータベースからファイル名を収集している場合、これはそれほど高速ではありません。

1 つの方法は、ディレクトリごとに dict ツリーを作成することです。dict ツリーをパスのディレクトリ部分と照合し、キーミスが発生した場合は、os.listdir を実行して、新しいディレクトリの dict エントリを見つけて作成し、未使用の部分を削除するか、変数カウンターを保持します。各ディレクトリに「ライフタイム」を割り当てます。

score 2 · Accepted Answer

以下は、3 つの変更を加えて独自のコードを少し書き直したものです。一致する前にファイル名が既に正しいかどうかを確認し、テスト前にリストを小文字に処理し、index を使用して関連する「真のケース」ファイルを見つけます。

def corrected_path(start, path):
    '''Returns a unix-type case-sensitive path, works in windows and linux'''
    start = unicode(start)
    path = unicode(path)
    corrected_path = ''
    if path[-1] == '/':
        path = path[:-1]
    parts = path.split('\\')
    cd = start
    corrections_count = 0

    for p in parts:
        if not os.path.exists(os.path.join(cd,p)): # Check it's not correct already
            listing = os.listdir(cd)

            cip = p.lower()
            cilisting = [l.lower() for l in listing]

            if cip in cilisting:
                l = listing[ cilisting.index(cip) ] # Get our real folder name
                cd = os.path.join(cd, l)
                corrected_path = os.path.join(corrected_path, l)
                corrections_count += 1
            else:
                return False # Error, this path element isn't found
        else:
            cd = os.path.join(cd, p)
            corrected_path = os.path.join(corrected_path, p)

    return corrected_path, corrections_count

これがはるかに高速になるかどうかはわかりませんが、テストは少し少なくなりますが、最初の「すでに正しい」キャッチが役立つ場合があります。

score 0 · Accepted Answer

大文字と小文字を区別しないキャッシュを使用して修正されたパスを引き出す拡張バージョン:

import os,re

def corrected_paths(start, pathlist):
    ''' This wrapper function takes a list of paths to correct vs. to allow caching '''

    start = unicode(start)
    pathlist = [unicode(path[:-1]) if path[-1] == '/' else unicode(path) for path in pathlist ]

    # Use a dict as a cache, storing oldpath > newpath for first-pass replacement
    # with path keys from incorrect to corrected paths
    cache = dict() 
    corrected_path_list = []
    corrections_count = 0
    path_split = re.compile('(/+|\+)')

    for path in pathlist:
        cd = start
        corrected_path = ''
        parts = path_split.split(path)

        # Pre-process against the cache
        for n,p in enumerate(parts):
            # We pass *parts to send through the contents of the list as a series of strings
            uncorrected_path= os.path.join( cd, *parts[0:len(parts)-n] ).lower() # Walk backwards
            if uncorrected_path in cache:
                # Move up the basepath to the latest matched position
                cd = os.path.join(cd, cache[uncorrected_path])
                parts = parts[len(parts)-n:] # Retrieve the unmatched segment
                break; # First hit, we exit since we're going backwards

        # Fallback to walking, from the base path cd point
        for n,p in enumerate(parts):

            if not os.path.exists(os.path.join(cd,p)): # Check it's not correct already
            #if p not in os.listdir(cd): # Alternative: The above does not work on Mac Os, returns case-insensitive path test

                listing = os.listdir(cd)

                cip = p.lower()
                cilisting = [l.lower() for l in listing]

                if cip in cilisting:

                    l = listing[ cilisting.index(cip) ] # Get our real folder name
                    # Store the path correction in the cache for next iteration
                    cache[ os.path.join(cd,p).lower() ] = os.path.join(cd, l)
                    cd = os.path.join(cd, l)
                    corrections_count += 1

                else:
                    print "Error %s not in folder %s" % (cip, cilisting)
                    return False # Error, this path element isn't found

            else:
                cd = os.path.join(cd, p)

        corrected_path_list.append(cd)

    return corrected_path_list, corrections_count

一連のパスに対して実行した例では、これにより listdirs の数が大幅に削減されます (これは、パスがどの程度似ているかによって明らかに異なります)。

corrected_paths('/Users/', ['mxF793/ScRiPtS/meTApaTH','mxF793/ScRiPtS/meTApaTH/metapAth/html','mxF793/ScRiPtS/meTApaTH/metapAth/html/css','mxF793/ScRiPts/PuBfig'])
([u'/Users/mxf793/Scripts/metapath', u'/Users/mxf793/Scripts/metapath/metapath/html', u'/Users/mxf793/Scripts/metapath/metapath/html/css', u'/Users/mxf793/Scripts/pubfig'], 14)
([u'/Users/mxf793/Scripts/metapath', u'/Users/mxf793/Scripts/metapath/metapath/html', u'/Users/mxf793/Scripts/metapath/metapath/html/css', u'/Users/mxf793/Scripts/pubfig'], 5)

これに向かう途中で、Mac OSX Python は大文字と小文字を区別しないかのようにパスの一致を返すため、存在のテストは常に成功することに気付きました。その場合、listdir を上に移動して置き換えることができます。

python - 大文字と小文字を区別するパスを取得する Pythonic の方法は?

2 に答える 2

Related

Reference