python - Pythonでインデントされたテキストファイルからツリー/深くネストされた辞書を作成する

Question

基本的に、ファイルを反復処理し、各行の内容を深くネストされた dict に入れたいと考えています。その構造は、各行の先頭にある空白の量によって定義されます。

基本的には、次のようなものを取ることが目的です。

そして、それを次のように変えます：

{"a":{"b":"c","d":"e"}}

またはこれ：

apple
    colours
        red
        yellow
        green
    type
        granny smith
    price
        0.10

これに：

{"apple":{"colours":["red","yellow","green"],"type":"granny smith","price":0.10}

これを Python の JSON モジュールに送信して、JSON を作成できるようにします。

現時点では、次のような手順で辞書とリストを作成しようとしています。

{"a":""} ["a"]
{"a":"b"} ["a"]
{"a":{"b":"c"}} ["a","b"]
{"a":{"b":{"c":"d"}}}} ["a","b","c"]
{"a":{"b":{"c":"d"},"e":""}} ["a","e"]
{"a":{"b":{"c":"d"},"e":"f"}} ["a","e"]
{"a":{"b":{"c":"d"},"e":{"f":"g"}}} ["a","e","f"]

等

リストは、最後に dict に入れた場所を示す「ブレッドクラム」のように機能します。

これを行うには、リストを反復処理して、dict["a"]["e"]["f"]最後の dict を取得するようなものを生成する方法が必要です。私は誰かが作成した AutoVivification クラスを見てきましたが、これは非常に便利に見えますが、本当に確信が持てません:

これに適切なデータ構造を使用しているかどうか (JSON ライブラリに送信して JSON オブジェクトを作成する予定です)
この場合の AutoVivification の使用方法
この問題に一般的にアプローチするためのより良い方法があるかどうか。

次の関数を思いつきましたが、機能しません。

def get_nested(dict,array,i):
if i != None:
    i += 1
    if array[i] in dict:
        return get_nested(dict[array[i]],array)
    else:
        return dict
else:
    i = 0
    return get_nested(dict[array[i]],array)

助けていただければ幸いです！

（私の非常に不完全なコードの残りはここにあります:)

#Import relevant libraries
import codecs
import sys

#Functions
def stripped(str):
    if tab_spaced:
        return str.lstrip('\t').rstrip('\n\r')
    else:
        return str.lstrip().rstrip('\n\r')

def current_ws():
    if whitespacing == 0 or not tab_spaced:
        return len(line) - len(line.lstrip())
    if tab_spaced:
        return len(line) - len(line.lstrip('\t\n\r'))

def get_nested(adict,anarray,i):
    if i != None:
        i += 1
        if anarray[i] in adict:
            return get_nested(adict[anarray[i]],anarray)
        else:
            return adict
    else:
        i = 0
        return get_nested(adict[anarray[i]],anarray)

#initialise variables
jsondict = {}
unclosed_tags = []
debug = []

vividfilename = 'simple.vivid'
# vividfilename = sys.argv[1]
if len(sys.argv)>2:
    jsfilename = sys.argv[2]
else:
    jsfilename = vividfilename.split('.')[0] + '.json'

whitespacing = 0
whitespace_array = [0,0]
tab_spaced = False

#open the file
with codecs.open(vividfilename,'rU', "utf-8-sig") as vividfile:
    for line in vividfile:
        #work out how many whitespaces at start
        whitespace_array.append(current_ws())

        #For first line with whitespace, work out the whitespacing (eg tab vs 4-space)
        if whitespacing == 0 and whitespace_array[-1] > 0:
            whitespacing = whitespace_array[-1]
            if line[0] == '\t':
                tab_spaced = True

        #strip out whitespace at start and end
        stripped_line = stripped(line)

        if whitespace_array[-1] == 0:
            jsondict[stripped_line] = ""
            unclosed_tags.append(stripped_line)

        if whitespace_array[-2] < whitespace_array[-1]:
            oldnested = get_nested(jsondict,whitespace_array,None)
            print oldnested
            # jsondict.pop(unclosed_tags[-1])
            # jsondict[unclosed_tags[-1]]={stripped_line:""}
            # unclosed_tags.append(stripped_line)

        print jsondict
        print unclosed_tags

print jsondict
print unclosed_tags

score 3 · Accepted Answer

次のコードは、ブロックインデントファイルを受け取り、XML ツリーに変換します。これ：

foo
bar
baz
  ban
  bal

...次のようになります:

<cmd>foo</cmd>
<cmd>bar</cmd>
<block>
  <name>baz</name>
  <cmd>ban</cmd>
  <cmd>bal</cmd>
</block>

基本的なテクニックは次のとおりです。

インデントを 0 に設定
行ごとに、インデントを取得します
>現在の場合、ステップダウンして現在のブロック/IDをスタックに保存します
== 現在の場合、現在のブロックに追加
< current の場合、一致するインデントに到達するまでスタックからポップします

そう：

from lxml import builder
C = builder.ElementMaker()

def indent(line):
    strip = line.lstrip()
    return len(line) - len(strip), strip

def parse_blockcfg(data):
    top = current_block = C.config()
    stack = []
    current_indent = 0

    lines = data.split('\n')
    while lines:
        line = lines.pop(0)
        i, line = indent(line)

        if i==current_indent:
            pass

        elif i > current_indent:
            # we've gone down a level, convert the <cmd> to a block
            # and then save the current ident and block to the stack
            prev.tag = 'block'
            prev.append(C.name(prev.text))
            prev.text = None
            stack.insert(0, (current_indent, current_block))
            current_indent = i
            current_block = prev

        elif i < current_indent:
            # we've gone up one or more levels, pop the stack
            # until we find out which level and return to it
            found = False
            while stack:
                parent_indent, parent_block = stack.pop(0)
                if parent_indent==i:
                    found = True
                    break
            if not found:
                raise Exception('indent not found in parent stack')
            current_indent = i
            current_block = parent_block

        prev = C.cmd(line)
        current_block.append(prev)

    return top

python - Pythonでインデントされたテキストファイルからツリー/深くネストされた辞書を作成する

4 に答える 4

Related

Reference