python - ワンホットエンコーディングの文字列表現を生成する

Question

Python では、文字をその文字の定義済みの「ワンホット」表現にdictマップするを生成する必要があります。例として、は次のようになります。dict

{ 'A': '1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0',
  'B': '0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0', # ...
}

アルファベットの文字ごとに 1 ビット (文字として表される) があります。したがって、各文字列には 25 個のゼロと 1 個の 1 が含まれます。の位置は1、アルファベットの対応する文字の位置によって決まります。

これを生成するコードを思いつきました：

# Character set is explicitly specified for fine grained control
_letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
n = len(_letters)
one_hot = [' '.join(['0']*a + ['1'] + ['0']*b)
            for a, b in zip(range(n), range(n-1, -1, -1))]
outputs = dict(zip(_letters, one_hot))

同じことを行うためのより効率的/クリーン/よりPythonicな方法はありますか?

score 7 · Accepted Answer

これはより読みやすいと思います：

from string import ascii_uppercase

one_hot = {}
for i, l in enumerate(ascii_uppercase):
    bits = ['0']*26; bits[i] = '1'
    one_hot[l] = ' '.join(bits)

より一般的なアルファベットが必要な場合は、文字列を列挙し['0']*26、['0']*len(alphabet).

score 2 · Accepted Answer

Python 2.5 以降では、条件演算子を使用できます。

from string import ascii_uppercase

one_hot = {}
for i, c in enumerate(ascii_uppercase):
    one_hot[c] = ' '.join('1' if j == i else '0' for j in range(26))

score 1 · Accepted Answer

one_hot = [' '.join(['0']*a + ['1'] + ['0']*b)
            for a, b in zip(range(n), range(n-1, -1, -1))]
outputs = dict(zip(_letters, one_hot))

特に、この 2 行には多くのコードが詰め込まれています。Introduce Explaining Variableリファクタリングを試すことができます。または多分抽出方法。

一例を次に示します。

def single_onehot(a, b):
    return ' '.join(['0']*a + ['1'] + ['0']*b)

range_zip = zip(range(n), range(n-1, -1, -1))
one_hot = [ single_onehot(a, b) for a, b in range_zip]
outputs = dict(zip(_letters, one_hot))

あなたは私の命名に同意しないかもしれませんが.

score -1 · Accepted Answer

-1

それは私には非常に明確で、簡潔で、Pythonic のように思えます。

于 2009-10-10T20:23:28.080 に答える

python - ワンホット エンコーディングの文字列表現を生成する

4 に答える 4

Related

Reference

python - ワンホットエンコーディングの文字列表現を生成する