python - ネストされた辞書を操作するための Python の再帰的な setattr() のような関数

Question

次のような、ネストされた辞書構造を解析するための getattr() のような優れた関数が多数あります。

並列の setattr() を作成したいと思います。基本的に、与えられた：

cmd = 'f[0].a'
val = 'whatever'
x = {"a":"stuff"}

割り当てることができるような関数を作成したいと思います:

x['f'][0]['a'] = val

多かれ少なかれ、これは次と同じように機能します。

setattr(x,'f[0].a',val)

得た：

>>> x
{"a":"stuff","f":[{"a":"whatever"}]}

私は現在それを呼んでいsetByDot()ます：

setByDot(x,'f[0].a',val)

これに関する 1 つの問題は、中間のキーが存在しない場合、存在しない場合は中間キーを確認して作成する必要があることです。つまり、上記の場合:

>>> x = {"a":"stuff"}
>>> x['f'][0]['a'] = val
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'f'

したがって、最初に以下を作成する必要があります。

>>> x['f']=[{}]
>>> x
{'a': 'stuff', 'f': [{}]}
>>> x['f'][0]['a']=val
>>> x
{'a': 'stuff', 'f': [{'a': 'whatever'}]}

もう 1 つは、次の項目がリストの場合のキーイングは、次の項目が文字列の場合のキーイングとは異なります。つまり、次のようになります。

>>> x = {"a":"stuff"}
>>> x['f']=['']
>>> x['f'][0]['a']=val
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

...割り当てがnull dictではなくnull文字列用だったため、失敗します。null dict は、リストまたは値である最後のものまで、dict 内のすべての非リストの正しい割り当てになります。

以下の @TokenMacGuy のコメントで指摘された 2 番目の問題は、存在しないリストを作成する必要がある場合、非常に多くの空白の値を作成する必要がある場合があることです。そう、

setattr(x,'f[10].a',val)

---アルゴリズムが次のような中間体を作成する必要があることを意味する場合があります。

>>> x['f']=[{},{},{},{},{},{},{},{},{},{},{}]
>>> x['f'][10]['a']=val

得た

>>> x 
{"a":"stuff","f":[{},{},{},{},{},{},{},{},{},{},{"a":"whatever"}]}

これがゲッターに関連付けられたセッターであるように...

>>> getByDot(x,"f[10].a")
"whatever"

さらに重要なことに、中間体は既に存在する値を上書き/しない/上書きすべきではありません。

以下は、私がこれまでに持っていた厄介なアイデアです---リストと辞書やその他のデータ型を識別し、それらが存在しない場所に作成できます。ただし、(a) 再帰呼び出しを配置する場所、または (b) リストを反復処理するときにディープオブジェクトを「構築」する方法、および (c) /probing/ を区別する方法がわかりません。 /setting/ から深いオブジェクトを構築するのと同じように、スタックの最後に到達したときに行う必要があります。

def setByDot(obj,ref,newval):
    ref = ref.replace("[",".[")
    cmd = ref.split('.')
    numkeys = len(cmd)
    count = 0
    for c in cmd:
        count = count+1
        while count < numkeys:
            if c.find("["):
                idstart = c.find("[")
                numend = c.find("]")
                try:
                    deep = obj[int(idstart+1:numend-1)]
                except:
                    obj[int(idstart+1:numend-1)] = []
                    deep = obj[int(idstart+1:numend-1)]
            else:
                try:
                    deep = obj[c]
                except:
                    if obj[c] isinstance(dict):
                        obj[c] = {}
                    else:
                        obj[c] = ''
                    deep = obj[c]
        setByDot(deep,c,newval)

プレースホルダーを作成している場合、 /next/ オブジェクトのタイプを確認するために先読みする必要があり、パスを作成するために後読みする必要があるため、これは非常に注意が必要です。

アップデート

私も最近この質問に答えてもらいました。これは関連性があるか、役立つかもしれません。

score 2 · Accepted Answer

>>> class D(dict):
...     def __missing__(self, k):
...         ret = self[k] = D()
...         return ret
... 
>>> x=D()
>>> x['f'][0]['a'] = 'whatever'
>>> x
{'f': {0: {'a': 'whatever'}}}

score 2 · Accepted Answer

これを 2 つのステップに分けました。最初のステップでは、クエリ文字列が一連の命令に分解されます。このようにして問題が分離され、命令を実行する前に命令を表示でき、再帰呼び出しの必要がなくなります。

def build_instructions(obj, q):
    """
    Breaks down a query string into a series of actionable instructions.

    Each instruction is a (_type, arg) tuple.
    arg -- The key used for the __getitem__ or __setitem__ call on
           the current object.
    _type -- Used to determine the data type for the value of
             obj.__getitem__(arg)

    If a key/index is missing, _type is used to initialize an empty value.
    In this way _type provides the ability to
    """
    arg = []
    _type = None
    instructions = []
    for i, ch in enumerate(q):
        if ch == "[":
            # Begin list query
            if _type is not None:
                arg = "".join(arg)
                if _type == list and arg.isalpha():
                    _type = dict
                instructions.append((_type, arg))
                _type, arg = None, []
            _type = list
        elif ch == ".":
            # Begin dict query
            if _type is not None:
                arg = "".join(arg)
                if _type == list and arg.isalpha():
                    _type = dict
                instructions.append((_type, arg))
                _type, arg = None, []

            _type = dict
        elif ch.isalnum():
            if i == 0:
                # Query begins with alphanum, assume dict access
                _type = type(obj)

            # Fill out args
            arg.append(ch)
        else:
            TypeError("Unrecognized character: {}".format(ch))

    if _type is not None:
        # Finish up last query
        instructions.append((_type, "".join(arg)))

    return instructions

あなたの例では

>>> x = {"a": "stuff"}
>>> print(build_instructions(x, "f[0].a"))
[(<type 'dict'>, 'f'), (<type 'list'>, '0'), (<type 'dict'>, 'a')]

期待される戻り値は、単に_type命令の次のタプルの (最初の項目) です。不足しているキーを正しく初期化/再構築できるため、これは非常に重要です。

これは、最初の命令が a で動作しdict、キーを設定または取得し、'f'a を返すことが期待されることを意味しますlist。同様に、2 番目の命令はで動作しlist、インデックスを設定または取得し、0を返すことが期待されdictます。

_setattrそれでは、関数を作成しましょう。これにより、適切な指示が取得され、必要に応じてキーと値のペアが作成されます。最後に、指定したも設定しvalます。

def _setattr(obj, query, val):
    """
    This is a special setattr function that will take in a string query,
    interpret it, add the appropriate data structure to obj, and set val.

    We only define two actions that are available in our query string:
    .x -- dict.__setitem__(x, ...)
    [x] -- list.__setitem__(x, ...) OR dict.__setitem__(x, ...)
           the calling context determines how this is interpreted.
    """
    instructions = build_instructions(obj, query)
    for i, (_, arg) in enumerate(instructions[:-1]):
        _type = instructions[i + 1][0]
        obj = _set(obj, _type, arg)

    _type, arg = instructions[-1]
    _set(obj, _type, arg, val)

def _set(obj, _type, arg, val=None):
    """
    Helper function for calling obj.__setitem__(arg, val or _type()).
    """
    if val is not None:
        # Time to set our value
        _type = type(val)

    if isinstance(obj, dict):
        if arg not in obj:
            # If key isn't in obj, initialize it with _type()
            # or set it with val
            obj[arg] = (_type() if val is None else val)
        obj = obj[arg]
    elif isinstance(obj, list):
        n = len(obj)
        arg = int(arg)
        if n > arg:
            obj[arg] = (_type() if val is None else val)
        else:
            # Need to amplify our list, initialize empty values with _type()
            obj.extend([_type() for x in range(arg - n + 1)])
        obj = obj[arg]
    return obj

できるという理由だけで、ここに_getattr関数があります。

def _getattr(obj, query):
    """
    Very similar to _setattr. Instead of setting attributes they will be
    returned. As expected, an error will be raised if a __getitem__ call
    fails.
    """
    instructions = build_instructions(obj, query)
    for i, (_, arg) in enumerate(instructions[:-1]):
        _type = instructions[i + 1][0]
        obj = _get(obj, _type, arg)

    _type, arg = instructions[-1]
    return _get(obj, _type, arg)


def _get(obj, _type, arg):
    """
    Helper function for calling obj.__getitem__(arg).
    """
    if isinstance(obj, dict):
        obj = obj[arg]
    elif isinstance(obj, list):
        arg = int(arg)
        obj = obj[arg]
    return obj

実際に:

>>> x = {"a": "stuff"}
>>> _setattr(x, "f[0].a", "test")
>>> print x
{'a': 'stuff', 'f': [{'a': 'test'}]}
>>> print _getattr(x, "f[0].a")
"test"

>>> x = ["one", "two"]
>>> _setattr(x, "3[0].a", "test")
>>> print x
['one', 'two', [], [{'a': 'test'}]]
>>> print _getattr(x, "3[0].a")
"test"

さて、クールなものをいくつか。Python とは異なり、この_setattr関数はハッシュ不可能なdictキーを設定できます。

x = []
_setattr(x, "1.4", "asdf")
print x
[{}, {'4': 'asdf'}]  # A list, which isn't hashable

>>> y = {"a": "stuff"}
>>> _setattr(y, "f[1.4]", "test")  # We're indexing f with 1.4, which is a list!
>>> print y
{'a': 'stuff', 'f': [{}, {'4': 'test'}]}
>>> print _getattr(y, "f[1.4]")  # Works for _getattr too
"test"

ハッシュ化できないキーを実際に使用しているわけではありませんdictが、クエリ言語を使用しているように見えるので、気にする必要はありません。

_setattr最後に、同じオブジェクトに対して複数の呼び出しを実行できます。自分で試してみてください。

score 2 · Accepted Answer

__getitem__元の関数に値を設定できるプロキシを返すようにオーバーライドすることで、設定項目・属性を再帰的に合成することが可能です。

私はたまたまこれに似たいくつかのことを行うライブラリに取り組んでいるので、インスタンス化時に独自のサブクラスを動的に割り当てることができるクラスに取り組んでいました。この種の作業は簡単になりますが、そのようなハッキングが苦手な場合は、私が作成したものと同様の ProxyObject を作成し、ProxyObject で使用される個々のクラスを a 関数で動的に作成することで、同様の動作を得ることができます。 . 何かのようなもの

class ProxyObject(object):
    ... #see below

def instanciateProxyObjcet(val):
   class ProxyClassForVal(ProxyObject,val.__class__):
       pass
   return ProxyClassForVal(val)

以下の FlexibleObject で使用したようなディクショナリを使用すると、実装方法が大幅に効率化されます。ただし、提供するコードは FlexibleObject を使用します。__init__現時点では、Python のほとんどすべての組み込みクラスと同様に、 /への唯一の引数として自身のインスタンスを取得することによって生成できるクラスのみをサポートしています__new__。来週か 2 週間以内に、pickleable のサポートを追加し、それを含む github リポジトリにリンクします。コードは次のとおりです。

class FlexibleObject(object):
    """ A FlexibleObject is a baseclass for allowing type to be declared
        at instantiation rather than in the declaration of the class.

        Usage:
        class DoubleAppender(FlexibleObject):
            def append(self,x):
                super(self.__class__,self).append(x)
                super(self.__class__,self).append(x)

        instance1 = DoubleAppender(list)
        instance2 = DoubleAppender(bytearray)
    """
    classes = {}
    def __new__(cls,supercls,*args,**kws):
        if isinstance(supercls,type):
            supercls = (supercls,)
        else:
            supercls = tuple(supercls)
        if (cls,supercls) in FlexibleObject.classes:
            return FlexibleObject.classes[(cls,supercls)](*args,**kws)
        superclsnames = tuple([c.__name__ for c in supercls])
        name = '%s%s' % (cls.__name__,superclsnames)
        d = dict(cls.__dict__)
        d['__class__'] = cls
        if cls == FlexibleObject:
            d.pop('__new__')
        try:
            d.pop('__weakref__')
        except:
            pass
        d['__dict__'] = {}
        newcls = type(name,supercls,d)
        FlexibleObject.classes[(cls,supercls)] = newcls
        return newcls(*args,**kws)

次に、 this を使用して this を使用して、辞書のようなオブジェクトの属性と項目を検索する合成を行うには、次のようにします。

class ProxyObject(FlexibleObject):
    @classmethod
    def new(cls,obj,quickrecdict,path,attribute_marker):
        self = ProxyObject(obj.__class__,obj)
        self.__dict__['reference'] = quickrecdict
        self.__dict__['path'] = path
        self.__dict__['attr_mark'] = attribute_marker
        return self
    def __getitem__(self,item):
        path = self.__dict__['path'] + [item]
        ref = self.__dict__['reference']
        return ref[tuple(path)]
    def __setitem__(self,item,val):
        path = self.__dict__['path'] + [item]
        ref = self.__dict__['reference']
        ref.dict[tuple(path)] = ProxyObject.new(val,ref,
                path,self.__dict__['attr_mark'])
    def __getattribute__(self,attr):
        if attr == '__dict__':
            return object.__getattribute__(self,'__dict__')
        path = self.__dict__['path'] + [self.__dict__['attr_mark'],attr]
        ref = self.__dict__['reference']
        return ref[tuple(path)]
    def __setattr__(self,attr,val):
        path = self.__dict__['path'] + [self.__dict__['attr_mark'],attr]
        ref = self.__dict__['reference']
        ref.dict[tuple(path)] = ProxyObject.new(val,ref,
                path,self.__dict__['attr_mark'])

class UniqueValue(object):
    pass

class QuickRecursiveDict(object):
    def __init__(self,dictionary={}):
        self.dict = dictionary
        self.internal_id = UniqueValue()
        self.attr_marker = UniqueValue()
    def __getitem__(self,item):
        if item in self.dict:
            val = self.dict[item]
            try:
                if val.__dict__['path'][0] == self.internal_id:
                    return val
                else:
                    raise TypeError
            except:
                return ProxyObject.new(val,self,[self.internal_id,item],
                        self.attr_marker)
        try:
            if item[0] == self.internal_id:
                return ProxyObject.new(KeyError(),self,list(item),
                        self.attr_marker)
        except TypeError:
            pass #Item isn't iterable
        return ProxyObject.new(KeyError(),self,[self.internal_id,item],
                    self.attr_marker)
    def __setitem__(self,item,val):
        self.dict[item] = val

実装の詳細は、必要なものによって異なります。とまたは__getitem__の両方をオーバーライドするよりも、プロキシでオーバーライドする方が明らかにはるかに簡単です。で使用している構文は、2 つの混合物をオーバーライドするソリューションに最も満足しているように見えます。__getitem____getattribute____getattr__setbydot

ディクショナリを使用して値を比較するだけの場合は、=、<=、>= などを使用します。オーバーライド__getattribute__は非常にうまく機能します。より洗練された何かをしたい場合は、ディクショナリに値を設定して属性の設定を合成するか、実際に属性を設定するかを決定するために、オーバーライド__getattr__していくつかのチェックインを行う方がよいでしょう。__setattr__入手したアイテムについて。または、オブジェクトに属性がある場合は、__getattribute__その属性にプロキシを返し、__setattr__常にオブジェクトに属性を設定するように処理することができます (この場合、属性を完全に省略できます)。これらはすべて、辞書を何に使用しようとしているかによって異なります。

などを作成することもできます__iter__。それらを作成するには少し手間がかかりますが、詳細は__getitem__との実装に従ってください__setitem__。

QuickRecursiveDict最後に、調べてもすぐにわからない場合に備えて、の動作を簡単に要約します。iftry/excepts は、 s を実行できるかどうかを確認するための簡単な方法です。再帰的な設定を行う方法を見つけるのではなく合成することの主な欠点の 1 つは、設定されていないキーにアクセスしようとしたときに KeyErrors を発生させることができなくなることです。ただし、この例で行っているように、KeyError のサブクラスを返すことでかなり近づけることができます。私はそれをテストしていないので、コードには追加しませんが、人間が判読できるキーの表現を KeyError に渡したいと思うかもしれません。

しかし、それを除けば、かなりうまく機能します。

>>> qrd = QuickRecursiveDict
>>> qrd[0][13] # returns an instance of a subclass of KeyError
>>> qrd[0][13] = 9
>>> qrd[0][13] # 9
>>> qrd[0][13]['forever'] = 'young'
>>> qrd[0][13] # 9
>>> qrd[0][13]['forever'] # 'young'
>>> qrd[0] # returns an instance of a subclass of KeyError
>>> qrd[0] = 0
>>> qrd[0] # 0
>>> qrd[0][13]['forever'] # 'young'

もう 1 つの注意点として、返されるものは見た目とはまったく異なります。それがどのように見えるかのプロキシです。int9 が必要な場合は、必要ありませint(qrd[0][13])んqrd[0][13]。int の場合、これはあまり重要ではありません。+、-、= などはすべてバイパス__getattribute__されますが、リストの場合appendは、再キャストしなかった場合のように属性が失われます。（lenの属性だけでなく、他の組み込みメソッドも保持しますlist。失われ__len__ます。）

それだけです。コードは少し複雑ですので、ご不明な点がございましたらお知らせください。答えが非常に簡潔でない限り、今夜まで答えられないでしょう。この質問をもっと早く見たかったです。これは本当にクールな質問です。すぐによりクリーンなソリューションを更新しようと思います。昨夜の未明に解決策をコーディングするのは楽しかったです。:)

python - ネストされた辞書を操作するための Python の再帰的な setattr() のような関数

4 に答える 4

Related

Reference