4

私は次の入力を持っています:

input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)]

次の出力を取得しようとしています。

outputlist = [[0, 0, 1, 2], [1, 3, 4, 2]]

outputmapping = {0:dog, 1:cat, 2:mouse, 3:ruby, 4:python, 5:mouse}

スケーラビリティを念頭に置いて与えられた処理方法に関するヒント(var入力は非常に大きくなる可能性があります)。

4

4 に答える 4

6

おそらく次のようなものが必要です。

import collections
import itertools

def build_catalog(L):
    counter = itertools.count().next
    names = collections.defaultdict(counter)
    result = []
    for t in L:
        new_t = [ names[item] for item in t ]
        result.append(new_t)
    catalog = dict((name, idx) for idx, name in names.iteritems())
    return result, catalog

それを使用して:

>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> outputlist, outputmapping = build_catalog(input)
>>> outputlist
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> outputmapping
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}
于 2010-04-28T12:11:02.227 に答える
2

このクラスは、オブジェクトを増加する整数値に自動的にマップします。

class AutoMapping(object):
    def __init__(self):
        self.map = {}
        self.objects = []

    def __getitem__(self, val):
        if val not in self.map:
            self.map[val] = len(self.objects)
            self.objects.append(val)
        return self.map[val]

あなたの入力のための使用例:

>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> map = AutoMapping()
>>> [[map[x] for x in y] for y in input]
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> map.objects
['dog', 'cat', 'mouse', 'ruby', 'python']
>>> dict(enumerate(map.objects))
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}
于 2010-04-28T12:12:45.830 に答える
0

私は自分のプロジェクトで同じ問題を頻繁に抱えていたので、少し前にまさにこれを行うクラスをまとめました。

class UniqueIdGenerator(object):
    """A dictionary-like class that can be used to assign unique integer IDs to
    names.

    Usage:

    >>> gen = UniqueIdGenerator()
    >>> gen["A"]
    0
    >>> gen["B"]
    1
    >>> gen["C"]
    2
    >>> gen["A"]      # Retrieving already existing ID
    0
    >>> len(gen)      # Number of already used IDs
    3
    """

    def __init__(self, id_generator=None):
        """Creates a new unique ID generator. `id_generator` specifies how do we
        assign new IDs to elements that do not have an ID yet. If it is `None`,
        elements will be assigned integer identifiers starting from 0. If it is
        an integer, elements will be assigned identifiers starting from the given
        integer. If it is an iterator or generator, its `next` method will be
        called every time a new ID is needed."""
        if id_generator is None:
            id_generator = 0
        if isinstance(id_generator, int):
            import itertools
            self._generator = itertools.count(id_generator)
        else:
            self._generator = id_generator
        self._ids = {}

    def __getitem__(self, item):
        """Retrieves the ID corresponding to `item`. Generates a new ID for `item`
        if it is the first time we request an ID for it."""
        try:
            return self._ids[item]
        except KeyError:
            self._ids[item] = self._generator.next()
            return self._ids[item]

    def __len__(self):
        """Retrieves the number of added elements in this UniqueIDGenerator"""
        return len(self._ids)

    def reverse_dict(self):
        """Returns the reversed mapping, i.e., the one that maps generated IDs to their
        corresponding items"""
        return dict((v, k) for k, v in self._ids.iteritems())

    def values(self):
        """Returns the list of items added so far. Items are ordered according to
        the standard sorting order of their keys, so the values will be exactly
        in the same order they were added if the ID generator generates IDs in
        ascending order. This hold, for instance, to numeric ID generators that
        assign integers starting from a given number."""
        return sorted(self._ids.keys(), key = self._ids.__getitem__)

使用例:

>>> input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)]
>>> gen = UniqueIdGenerator()
>>> outputlist = [[gen[x] for x in y] for y in input]
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> print outputlist
>>> outputmapping = gen.reverse_dict()
>>> print outputmapping
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}
于 2010-04-28T12:49:23.613 に答える
0

これは最善ではありませんが、考えられる解決策の 1 つです。リスト内の各エントリに含まれる要素の数が事前に割り当てられていることを事前に知っていると、わずかに効率的になります。

labels=[];
label2index={};
outputlist=[];
for group in input:
    current=[];
    for label in group:
       if label not in label2index:
           label2index[label]=len(labels);
           labels.append(label);
       current.append(label2index[label]);
    outputlist.append(current);

outputmapping={};
for idx, val in enumerate(labels):
    outputmapping[idx]=val;
于 2010-04-28T12:08:26.463 に答える