python - 不規則にネストされたリストを展開して平坦化する

Question

ネストされたリストをフラット化するトピックは以前に非常に詳細に説明されていることを知っていますが、私のタスクは少し異なり、情報が見つかりませんでした.

私はスクレーパーを書いていますが、出力としてネストされたリストを取得します。最上位のリスト要素は、スプレッドシート形式のデータの行になるはずです。ただし、ネストされたリストは長さが異なることが多いため、リストをフラット化する前にリストを拡張する必要があります。

これが例です。私は持っている

   [ [ "id1", [["x", "y", "z"], [1, 2]],    ["a", "b", "c"]],
     [ "id2", [["x", "y", "z"], [1, 2, 3]], ["a", "b"]],
     [ "id3", [["x", "y"],      [1, 2, 3]], ["a", "b", "c", ""]] ]

私が最終的に望む出力は

   [[ "id1", "x", "y",  z, 1, 2, "", "a", "b", "c", ""],
    [ "id2", "x", "y",  z, 1, 2,  3, "a", "b",  "", ""],
    [ "id3", "x", "y", "", 1, 2,  3, "a", "b", "c", ""]]

ただし、このような中間リスト

   [ [ "id1", [["x", "y", "z"], [1, 2, ""]], ["a", "b", "c", ""]],
     [ "id2", [["x", "y", "z"], [1, 2,  3]], ["a", "b",  "", ""]],
     [ "id3", [["x", "y",  ""], [1, 2,  3]], ["a", "b", "c", ""]] ]

単純に平らにすることもできます。

最上位のリスト要素 (行) は反復ごとに作成され、完全なリストに追加されます。最後に完全なリストを変換する方が簡単だと思いますか?

要素が入れ子になっている構造は同じはずですが、現時点では確信が持てません。このように見える構造の場合、私は問題があると思います。

   [ [ "id1", [[x, y, z], [1, 2]],             ["a", "b", "c"]],
     [ "id2", [[x, y, z], [1, 2, 3]], ["bla"], ["a", "b"]],
     [ "id3", [[x, y],    [1, 2, 3]],          ["a", "b", "c", ""]] ]

なるべきもの

   [[ "id1", x, y,  z, 1, 2, "",    "", "a", "b", "c", ""],
    [ "id2", x, y,  z, 1, 2,  3, "bla", "a", "b",  "", ""],
    [ "id3", x, y, "", 1, 2,  3,    "", "a", "b", "c", ""]]

コメントありがとうございます。これが些細なことである場合は申し訳ありません。私は Python を初めて使用します。

score 6 · Accepted Answer

izip_longest再帰ジェネレーターとの関数を使用して、「同じ構造」の場合の簡単な解決策を見つけましたitertools。このコードは Python 2 用ですが、いくつかの調整 (コメントに記載) を行うことで、Python 3 でも動作するようにすることができます。

from itertools import izip_longest # in py3, this is renamed zip_longest

def flatten(nested_list):
    return zip(*_flattengen(nested_list)) # in py3, wrap this in list()

def _flattengen(iterable):
    for element in izip_longest(*iterable, fillvalue=""):
        if isinstance(element[0], list):
            for e in _flattengen(element):
                yield e
        else:
            yield element

Python 3.3 では、 PEP 380のおかげでさらに単純になり、再帰ステップがfor e in _flatengen(element): yield eになりyield from _flattengen(element)ます。

score 3 · Accepted Answer

実際には、構造が同じでない一般的なケースの解決策はありません。たとえば、通常のアルゴリズムはと一致["bla"]し["a", "b", "c"]、結果は次のようになります。

 [  [ "id1", x, y,  z, 1, 2, "",   "a", "b", "c", "",  "",  ""],
    [ "id2", x, y,  z, 1, 2,  3, "bla",  "",  "", "", "a", "b"],
    [ "id3", x, y, "", 1, 2,  3,   "a", "b", "c", "",  "",  ""]]

しかし、多数の行があり、それぞれが ID で始まり、その後にネストされたリスト構造が続くことがわかっている場合は、以下のアルゴリズムが機能するはずです。

import itertools

def normalize(l):
    # just hack the first item to have only lists of lists or lists of items
    for sublist in l:
        sublist[0] = [sublist[0]]

    # break the nesting
    def flatten(l):
        for item in l:
            if not isinstance(item, list) or 0 == len([x for x in item if isinstance(x, list)]):
                yield item
            else:
                for subitem in flatten(item):
                    yield subitem

    l = [list(flatten(i)) for i in l]

    # extend all lists to greatest length
    list_lengths = { }
    for i in range(0, len(l[0])):
        for item in l:
            list_lengths[i] = max(len(item[i]), list_lengths.get(i, 0))

    for i in range(0, len(l[0])):
        for item in l:
            item[i] += [''] * (list_lengths[i] - len(item[i]))

    # flatten each row
    return [list(itertools.chain(*sublist)) for sublist in l]

l = [ [ "id1", [["x", "y", "z"], [1, 2]],    ["a", "b", "c"]],
      [ "id2", [["x", "y", "z"], [1, 2, 3]], ["a", "b"]],
      [ "id3", [["x", "y"],      [1, 2, 3]], ["a", "b", "c", ""]] ]
l = normalize(l)
print l

score 0 · Accepted Answer

def recursive_pad(l, spacer=""):
    # Make the function never modify it's arguments.
    l = list(l)

    is_list = lambda x: isinstance(x, list)
    are_subelements_lists = map(is_list, l)
    if not any(are_subelements_lists):
        return l

    # Would catch [[], [], "42"]
    if not all(are_subelements_lists) and any(are_subelements_lists):
        raise Exception("Cannot mix lists and non-lists!")

    lengths = map(len, l)
    if max(lengths) == min(lengths):
        #We're already done
        return l
    # Pad it out
    map(lambda x: list_pad(x, spacer, max(lengths)), l)
    return l

def list_pad(l, spacer, pad_to):
    for i in range(len(l), pad_to):
        l.append(spacer)

if __name__ == "__main__":
    print(recursive_pad([[[[["x", "y", "z"], [1, 2]], ["a", "b", "c"]], [[[x, y, z], [1, 2, 3]], ["a", "b"]], [[["x", "y"], [1, 2, 3]], ["a", "b", "c", ""]] ]))

編集：実際、私はあなたの質問を読み違えました。このコードは、わずかに異なる問題を解決します

python - 不規則にネストされたリストを展開して平坦化する

3 に答える 3

Related

Reference