python - Python-コード最適化ヘルプ-すべての辞書を検索-単語の有効なアナグラム

Question

私はこれを使用して問題を解決しました（ひどく非効率的な方法）：

def createList(word, wordList):
    #Made a set, because for some reason permutations was returning duplicates.  
    #Returns all permutations if they're in the wordList

    return set([''.join(item) for item in itertools.permutations(word) if ''.join(item) in wordList])

def main():

    a = open('C:\\Python32\\megalist.txt', 'r+')
    wordList = [line.strip() for line in a]
    maximum = 0
    length = 0
    maxwords = ""

    for words in wordList:
        permList = createList(words, wordList)
        length = len(permList)
        if length > maximum:
            maximum = length
            maxwords = permList
    print (maximum, maxwords)

辞書で有効なアナグラムが最も多い5文字の単語を見つけるのに10分ほどかかりました。文字の制約のない単語でこれを実行したいのですが、ばかげた時間がかかります。これを最適化する方法はありますか？

score 3 · Accepted Answer

以下は、小さめの辞書でも問題なく動作するようです。単語の文字を並べ替えることで、2 つの単語がアナグラムであるかどうかを簡単にテストできます。そこから、とにかく言葉を積み重ねていくだけです。最初の一致だけでなく、すべての一致を報告するようにこれを変更することは難しくありません。

文字数に制約を追加する必要がある場合、反復子を使用すると、一部の単語を除外する便利な方法になります。

def wordIterator(dictionaryFilename):
    with open(dictionaryFilename,'r') as f:
        for line in f:
            word = line.strip()
            yield word

def largestAnagram(words):
    import collections
    d = collections.defaultdict(list)
    for word in words:
        sortedWord = str(sorted(word))
        d[ hash(sortedWord) ].append(word)
    maxKey = max( d.keys(), key = lambda k : len(d[k]) )
    return d[maxKey]

iter = wordIterator( 'C:\\Python32\\megalist.txt' )
#iter = ( word for word in iter if len(word) == 5 )
print largestAnagram(iter)

編集：

コメントに応じて、hash(sortedWord), はスペース節約の最適化であり、この場合は時期尚早かもしれませんが、sortedWord を整数に戻します。これは、キーについてはあまり気にしないためです。アナグラム。sortedWordキーとして使用することも同様に有効でした。

keyキーワード引数 to をmax使用すると、述語に基づいてコレクション内の最大要素を見つけることができます。したがって、ステートメントmaxKey = max( d.keys(), key = lambda k : len(d[k]) )はクエリに答えるための簡潔な python 式であり、ディクショナリ内のキーが与えられた場合、どのキーが最大長の関連付けられた値を持っているのでしょうか? . そのコンテキストでのその呼び出しは、その関数が次のように定義されている場所maxとして (はるかに冗長に) 記述できた可能性があります。valueWithMaximumLength(d)

def valueWithMaximumLength( dictionary ):
    maxKey = None
    for k, v in dictionary.items():
        if not maxKey or len(dictionary[maxKey]) < len(v):
            maxKey = k
    return maxKey

score 2 · Accepted Answer

wordListである必要がありsetます。

リストのメンバーシップをテストするには、リストのすべての要素をスキャンして、生成した単語であるかどうかを確認する必要があります。セット内のメンバーシップのテストは、(平均して) セットのサイズに依存しません。

次の明らかな最適化は、単語をテストしたらwordList、からすべての順列を削除できることです。これは、でまったく同じセットが得られるためcreateListです。すべてがsets で行われる場合、これは非常に簡単な操作です。実際、バイナリマイナスを使用するだけです。

score 1 · Accepted Answer

すべての順列を実行する必要はありません。メモリと CPU の無駄です。したがって、まず、次のように、辞書をバイナリツリーに保持する必要があります。

   e.g. Dict = ['alex', 'noise', 'mother', 'xeal', 'apple', 'google', 'question']

   Step 1: find the "middle" word for the dictionary, e.g. "mother", because "m" 
           is somewhere in the middle of the English alphabet 
           (this step is not necessary, but it helps to balance the tree a bit)

   Step 2: Build the binary tree:

               mother 
              /      \
             /        \
         alex          noise
           \            /  \
            \          /    \
           apple   question xeal
             \
              \
             google

  Step 3: start looking for an anagram by permutations:
  alex: 1. "a"... (going deeper into binary tree, we find a word 'apple')
        1.1 # of, course we should omit apple, because len('apple') != len('alex')
            #  but that's enough for an example:
        2. Check if we can build word "pple" from "lex": ("a" is already reserved!)
           -- there is no "p" in "lex", so skipping, GOTO 1            
        ... 
        1. "l"... - nothing begins with "l"...
        1. "l"... - nothing begins with "e"...
        1. "x" - going deeper, "xeal" found
        2. Check if "eal" could be built from "ale" ("x" is already reserved)
           for letter in "eal":
               if not letter in "ale":
                  return False
           return True

それだけです:) このアルゴリズムははるかに高速に動作するはずです。

編集：

bintreesパッケージをチェックして、バイナリツリーの実装に時間を費やさないようにします。

python - Python-コード最適化ヘルプ-すべての辞書を検索-単語の有効なアナグラム

3 に答える 3

Related

Reference