python - Trie ツリーの解析中に単語ストロークをカウントする

Question

ここで説明されているキーボードのオートコンプリートの問題を解決しようとしています。問題は、いくつかの辞書とオートコンプリートのルールが与えられた場合に、単語に必要なキーストロークの数を計算することです。たとえば、辞書の場合:

data = ['hello', 'hell', 'heaven', 'goodbye']

次の結果が得られます (詳細については、上記のリンクを参照してください)。

{'hell': 2, 'heaven': 2, 'hello': 3, 'goodbye': 1}

簡単な説明: ユーザーがを入力するhと、eで始まるすべての単語は2 番目の文字hも持つため、オートコンプリートされます。eユーザーがを入力するlと、もう一方lが塗りつぶされ、という単語が 2 ストロークになりますhell。もちろん、helloもう 1 ストローク必要です。その他の例については、上記のリンクを参照してください。

私のTrieコードは次のとおりで、正常に動作します ( https://en.wikipedia.org/wiki/Trieから取得)。コードは、Stackルートからツリーを解析することです (以下の編集を参照)。

class Stack(object):
    def __init__(self, size):
        self.data = [None]*size
        self.i = 0
        self.size = size
        
    def pop(self):
        if self.i == 0:
            return None
        item = self.data[self.i - 1]
        self.i-= 1
        return item
    
    def push(self, item):
        if self.i >= self.size:
            return None
        self.data[self.i] = item
        self.i+= 1
        return item
        
    def __str__(self):
        s = '# Stack contents #\n'
        if self.i == 0:
            return
        for idx in range(self.i - 1, -1, -1):
            s+= str(self.data[idx]) + '\n'
        return s

class Trie(object):
    def __init__(self, value, children):
        self.value = value #char
        self.children = children #{key, trie}

class PrefixTree(object):
    def __init__(self, data):
        self.root = Trie(None, {})
        self.data = data
        
        for w in data:
            self.insert(w, w)
    
    def insert(self, string, value):
        node = self.root
        i = 0
        n = len(string)
        
        while i < n:
            if string[i] in node.children:
                node = node.children[string[i]]
                i = i + 1
            else:
                break
        
        while i < n:
            node.children[string[i]] = Trie(string[:i], {})
            node = node.children[string[i]]
            i = i + 1
            
        node.value = value
        
        
    def find(self, key):
        node = self.root
        for char in key:
            if char in node.children:
                node = node.children[char]
            else:
                return None
        return node

ストローク数を数える方法がわかりませんでした：

data = ['hello', 'hell', 'heaven', 'goodbye']
tree = PrefixTree(data)
strokes = {w:1 for w in tree.data} #at least 1 stroke is necessary

そして、ルートからツリーを解析するコードは次のとおりです。

stack = Stack(100)
stack.push((None, pf.root))
print 'Key\tChilds\tValue'
print '--'*25

strokes = {}
    
while stack.i > 0:
    key, curr = stack.pop()

    # if something:
         #update strokes

    print '%s\t%s\t%s' % (key, len(curr.children), curr.value)
    for key, node in curr.children.items():
        stack.push((key, node))
 
print strokes

どんなアイデアや建設的なコメントも役に立ちます、ありがとう!

編集

@SergiyKolesnikovによるすばらしい答え。への呼び出しを避けるためにできる小さな変更が 1 つありますendsWith()。Trie クラスにブール値フィールドを追加しました。

class Trie(object):
    def __init__(self, value, children, eow):
        self.value = value #char
        self.children = children #{key, trie}
        self.eow = eow # end of word

そして、insert() の最後に:

def insert(self, string, value):
#...
    node.value = value
    node.eow = True

curr.value.endswith('$'):次に、に置き換えるだけcurr.eowです。皆さん、ありがとうございました！

score 2 · Accepted Answer

あなたの例のトライは次のようになります

 ' '
|    \
H     G
|     |
E     O
| \   |
L  A  O
|  |  |
L$ V  D
|  |  |
O  E  B
   |  |
   N  Y
      |
      E

トライのどのノードがユーザーのキーストロークのマーカーとして表示されますか? このようなノードには、次の 2 つのタイプがあります。

ユーザーは複数の選択肢の中から選択する必要があるため、複数の子を持つ内部ノード。
$現在の単語が必要なものでない場合、ユーザーは次の文字を入力する必要があるため、単語の最後の文字を表すノード ( でマークされている) は葉ではありません。

トライを再帰的にトラバースしながら、単語の最後の文字に到達する前に、これらのマーカーノードがいくつ検出されたかをカウントします。このカウントは、単語に必要な画数です。

「地獄」という単語の場合、それは 2 つのマーカーノードです: ' 'and E(2 ストローク)。
「こんにちは」という単語の場合、3 つのマーカーノードです: ' '、E、L$(3 ストローク)。
等々...

実装で変更する必要があるもの:

2 番目の条件をチェックできるように、有効な単語の末尾をツリーでマークする必要があります。したがって、PrefixTree.insert()メソッドの最後の行を次のように変更します。

node.value = value

に

node.value = value + '$'

ここで、各スタックアイテム (スタックにプッシュされたトリプルの最後の値) のストロークカウンターと、カウンターを増加させるチェックを追加します。

stack = Stack(100)
stack.push((None, tree.root, 0)) # We start with stroke counter = 0
print('Key\tChilds\tValue')
print('--'*25)

strokes = {}

while stack.i > 0:
    key, curr, stroke_counter = stack.pop()

    if curr.value is not None and curr.value.endswith('$'):
        # The end of a valid word is reached. Save the word and the corresponding stroke counter.
        strokes[curr.value[:-1]] = stroke_counter

    if len(curr.children) > 1:
        # Condition 2 is true. Increase the stroke counter.
        stroke_counter += 1
    if curr.value is not None and curr.value.endswith('$') and len(curr.children) > 0:
        # Condition 1 is true. Increase the stroke counter.
        stroke_counter += 1

    print('%s\t%s\t%s' % (key, len(curr.children), curr.value))
    for key, node in curr.children.items():
        stack.push((key, node, stroke_counter)) # Save the stroke counter

print(strokes)

出力：

Key Childs  Value
--------------------------------------------------
None    2   None
h   1   
e   2   h
a   1   he
v   1   hea
e   1   heav
n   0   heaven$
l   1   he
l   1   hell$
o   0   hello$
g   1   
o   1   g
o   1   go
d   1   goo
b   1   good
y   1   goodb
e   0   goodbye$
{'heaven': 2, 'goodbye': 1, 'hell': 2, 'hello': 3}

score 1 · Accepted Answer

スタックを通過する間、各ノードのストロークカウンターを保持する必要があります。

None の場合は 0 から始まります。
現在のノードに 2 つ以上の子がある場合、子のカウンターは現在のカウンターより 1 つ多くなります。
現在の値が有効な単語で、少なくとも 1 つの子がある場合、子のカウンターは現在のカウンターよりも 1 大きくなります。

ドキュメントの目的で、ここに私のRubyの答えがあります：

class Node
  attr_reader :key, :children
  attr_writer :final
  def initialize(key, children = [])
    @key = key
    @children = children
    @final = false
  end

  def final?
    @final
  end
end

class Trie
  attr_reader :root
  def initialize
    @root = Node.new('')
  end

  def add(word)
    node = root
    word.each_char.each{|c|
      next_node = node.children.find{|child| child.key == c}
      if next_node then
        node = next_node
      else
        next_node = Node.new(c)
        node.children.push(next_node)
        node = next_node
      end
    }
    node.final = true
  end

  def count_strokes(node=root,word="",i=0)
    word=word+node.key
    strokes = {}
    if node.final? then
      strokes[word]=i
      if node.children.size>0 then
        i+=1
      end
    elsif node.children.size>1 then
      i+=1
    end
    node.children.each{|c|
      strokes.merge!(count_strokes(c, word, i))
    }
    strokes
  end
end

data = ['hello', 'hell', 'heaven', 'goodbye']
trie =  Trie.new
data.each do |word|
  trie.add(word)
end

# File.readlines('/usr/share/dict/british-english').each{|line|
#   trie.add line.strip
# }

puts trie.count_strokes
#=> {"hell"=>2, "hello"=>3, "heaven"=>2, "goodbye"=>1}

たった60行で、10万ワードで3秒もかかりません。

python - Trie ツリーの解析中に単語ストロークをカウントする

編集

2 に答える 2

Related

Reference