python - 繰り返し文字に `.index()` を使用する

Question

次のような単語で辞書を作成する関数を作成しています。

{'b': ['b', 'bi', 'bir', 'birt', 'birth', 'birthd', 'birthda', 'birthday'],
'bi': ['bi', 'bir', 'birt', 'birth', 'birthd', 'birthda', 'birthday'],
'birt': ['birt', 'birth', 'birthd', 'birthda', 'birthday'], 
'birthda': ['birthda', 'birthday'], 
'birthday': ['birthday'], 
'birth': ['birth', 'birthd', 'birthda', 'birthday'],
'birthd': ['birthd', 'birthda', 'birthday'], 
'bir': ['bir', 'birt', 'birth', 'birthd', 'birthda', 'birthday']}

これは次のようになります。

def add_prefixs(word, prefix_dict):
    lst=[]
    for letter in word:
        n=word.index(letter)
        if n==0:
            lst.append(word[0])
        else:
            lst.append(word[0:n])
    lst.append(word)
    lst.remove(lst[0])
    for elem in lst:
        b=lst.index(elem)
        prefix_dict[elem]=lst[b:]
    return prefix_dict

「誕生日」のような単語にはうまく機能しますが、文字が繰り返されると問題が発生します... たとえば、「こんにちは」.

{'h': ['h', 'he', 'he', 'hell', 'hello'], 'hell': ['hell', 'hello'], 'hello': ['hello'], 'he': ['he', 'he', 'hell', 'hello']}

インデックスが原因であることはわかっていますが（Pythonは文字が最初に表示されたインデックスを選択します）、解決方法がわかりません。はい、これは私の宿題です。皆さんから学ぼうとしています :)

score 4 · Accepted Answer

あなたはすでに単語をループしています。.index()カウンターを保持する代わりに。Python はそれをとても簡単にしてくれます。enumerate()関数を使用します。

for n, letter in enumerate(word):
    if n==0:
        lst.append(word[0])
    else:
        lst.append(word[0:n])

ただし、変数を使用しなくなったため、代わりに次のようにします。letterrange(len(word)

for n in range(len(word)):
    if n==0:
        lst.append(word[0])
    else:
        lst.append(word[0:n])

これをリスト内包表記に単純化できます。

lst = [word[0:max(n, 1)] for n in range(len(word))]

そこに注意してくださいmax()。が 0かどうかをテストする代わりに、スライスnの最小値を設定します。1

その後、最初のエントリを再度削除し (2 番目の結果と同じであるため) 、完全な単語を追加するので、n代わりにカウンターに1 を追加します。

lst = [word[0:n+1] for n in range(len(word))]

関数の後半では、次の代わりに関数を効果的に使用できます。enumerate().index()

for b, elem in enumerate(lst):
    prefix_dict[elem]=lst[b:]

これで、関数ははるかに単純になりました。インプレースで操作しているため、戻る必要がないことに注意してください。prefix_dict

def add_prefixs(word, prefix_dict):
    lst = [word[0:n+1] for n in range(len(word))]
    for b, elem in enumerate(lst):
        prefix_dict[elem]=lst[b:]

score 0 · Accepted Answer

文字ではなくインデックスの観点から考えると、ソリューションを単純化するのがはるかに簡単になります。通常、Python では値をループします。これが重要なのです。ここでは、実際に文字列に接頭辞を生成しています。ここでは、内容は重要ではなく、位置が重要です。

def prefixes(seq):
    for i in range(len(seq)):
        yield seq[:i+1]

segments = list(prefixes("birthday"))
print({segment: segments[start:] for start, segment in enumerate(segments)})

本当に必要なのは、単語の各プレフィックスを取得することです。これは、インデックスをループすることが有効なオプションであるというまれなケースで実行できます。それが私たちがやろうとしていることです。

次に、辞書内包表記を使用して、各セグメントに適切な「子」グループを選択します。

これにより、次のことがわかります（わかりやすくするために空白を追加しています）：

{
    'birt': ['birt', 'birth', 'birthd', 'birthda', 'birthday'], 
    'bir': ['bir', 'birt', 'birth', 'birthd', 'birthda', 'birthday'], 
    'birthday': ['birthday'], 
    'bi': ['bi', 'bir', 'birt', 'birth', 'birthd', 'birthda', 'birthday'], 
    'birthda': ['birthda', 'birthday'], 
    'b': ['b', 'bi', 'bir', 'birt', 'birth', 'birthd', 'birthda', 'birthday'], 
    'birthd': ['birthd', 'birthda', 'birthday'], 
    'birth': ['birth', 'birthd', 'birthda', 'birthday']
}

余分なループが気にならない場合は、次のように単純化することもできます。

def prefixes(word):
    for i in range(len(word)):
        segment = word[:i+1]
        yield segment, [segment[:i+1] for i in range(len(segment))]

print(dict(prefixes("birthday")))

補足として、別の実装prefixes()は次のとおりです。

def prefixes(seq):
    return prefixes(seq[:-1])+[seq] if seq else []

ただし、これは再帰関数であり、Python は再帰用に最適化されていないため、これはより悪い方法です。また、ジェネレータではなくリストを作成するため、場合によってはメモリ効率が低下します。

score 0 · Accepted Answer

Martijnは私よりも速かったですが、いくつか追加があります。

def add_prefixs(word, prefix_dict):
    lst=[]
    for n, letter in enumerate(word):
        if n > 0:
            lst.append(word[0:n])
    lst.append(word)
    for elem in lst:
        b=lst.index(elem)
        prefix_dict[elem]=lst[b:]
    return prefix_dict

すぐに削除する場合、なぜ0番目のエントリを配置するのですか?

別の単純化は

def add_prefixs(word, prefix_dict):
    #lst=[word[0:n] for n, letter in enumerate(word) if n > 0] + [word]
    # why do I think so complicated? Better use
    lst=[word[0:n+1] for n, letter in enumerate(word)]
    prefix_dict.update((elem, lst[b:]) for b, elem in enumerate(lst))
    return prefix_dict

のようなクラスで

class Segments(object):
    def __init__(self, string, minlength=1):
        self.string = string
        self.minlength = minlength
    def __getitem__(self, index):
        s = self.string[:self.minlength + index]
        if len(s) < self.minlength + index: raise IndexError
        if index >= len(self): raise IndexError # alternatively
        return s
    def cut(self, num):
        return type(self)(self.string, self.minlength + num)
    def __repr__(self):
        return repr(list(self))
    def __len__(self):
        return len(self.string) - self.minlength + 1

さらに単純化できます：

def add_prefixes(word, prefix_dict):
    lst = Segments(word)
    prefix_dict.update((prefix, lst.cut(n)) for n, prefix in enumerate(lst))
    return prefix_dict

うーん。もう一度考えてみると、これは簡単なことではありません。しかし、本質的に同じデータまたはそれらの一部のコピーを多数持つことを回避します...

score 0 · Accepted Answer

最もpythonicなアプローチは次のとおりだと思います：

def add_prefixs(word, prefix_dict):
    lst = [word[0:n+1] for n in range(len(word))]
    prefix_dict.update((k, lst[n:]) for n, k in enumerate(lst))

python - 繰り返し文字に `.index()` を使用する

4 に答える 4

Related

Reference