python - 特定の動的ネスト辞書、自動活性化の実装

Question

特定の方法でネストされた辞書構造を実装しようとしています。私は単語の長いリストを読んでいます。これらの単語は、最終的に頻繁かつ効率的に検索する必要があるため、辞書を次のように設定します。

最初のキー値が単語の長さであるネストされた辞書構造を作成しようとしています。値はキーが単語の最初の文字である辞書であり、値はキーが2番目の文字である辞書です単語の 3 番目の文字としてキーを持つ dict である値など。

"car" "can" と "joe" で読むと

私は得る

{3: {c: {a: {r: car, n: can}}},j: {o: {e: joe}}}

ただし、約 100,000 語に対してこれを行う必要があり、長さは 2 文字から 27 文字までさまざまです。

ネストされた辞書を実装する最良の方法は何ですか? および動的ネスト辞書。

しかし、これを理解するのに運がありませんでした。

私は確かに私のテキストファイルから私の言葉を取り出すことができます

for word in text_file.read().split()

を使用して各キャラクターに割り込むことができます

for char in word

また

for i in range(len(word)):
    word[i]

どうやってこの構造を崩すのか、私にはわかりません。どんな助けでも大歓迎です。

score 3 · Accepted Answer

に基づいて自動有効化を使用してトライを実装する方法の短い例を次に示しますdefaultdict。単語を終了するノードごとに、それを示す追加のキーを格納termします。

from collections import defaultdict

trie = lambda: defaultdict(trie)

def add_word(root, s):
    node = root
    for c in s:
        node = node[c]
    node['term'] = True

def list_words(root, length, prefix=''):
    if not length:
        if 'term' in root:
            yield prefix
        return

    for k, v in root.items(): 
        if k != 'term':
            yield from list_words(v, length - 1, prefix + k)

WORDS = ['cars', 'car', 'can', 'joe']
root = trie()
for word in WORDS:
    add_word(root, word)

print('Length {}'.format(3))
print('\n'.join(list_words(root, 3)))
print('Length {}'.format(4))
print('\n'.join(list_words(root, 4)))

出力：

Length 3
joe
can
car
Length 4
cars

score 1 · Accepted Answer

独自のカスタムサブクラスを使用または作成せずにそれを実行する方法を次に示します。そのため、結果の辞書は単なる通常のオブジェクトになります。collections.defaultdictdictdict

import pprint

def _build_dict(wholeword, chars, val, dic):
    if len(chars) == 1:
        dic[chars[0]] = wholeword
        return
    new_dict = dic.get(chars[0], {})
    dic[chars[0]] = new_dict
    _build_dict(wholeword, chars[1:], val, new_dict)

def build_dict(words):
    dic = {}
    for word in words:
        root = dic.setdefault(len(word), {})
        _build_dict(word, list(word), word[1:], root)
    return dic

words = ['a', 'ox', 'car', 'can', 'joe']
data_dict = build_dict(words)
pprint.pprint(data_dict)

出力：

{1: {'a': 'a'},
 2: {'o': {'x': 'ox'}},
 3: {'c': {'a': {'n': 'can', 'r': 'car'}}, 'j': {'o': {'e': 'joe'}}}}

これは、python.org Python-list Archives スレッドのBuilding and Transvering multi-level dictionariesというタイトルのメッセージに示されている再帰アルゴリズムに基づいています。

python - 特定の動的ネスト辞書、自動活性化の実装

3 に答える 3

Related

Reference