python - 属性の（不適切にフォーマットされた）リストを解析するためのアルゴリズム

Question

私はこれに適したアルゴリズムを見たことがありますが、それを見つけるのに苦労しています。

解析する必要のあるツール（出力スタイルを制御できない）からの（フォーマットが不十分な）出力があります。

これは次のようになります。

NameOfItemA
attribute1 = values1
attribute2 = values2
...
attributen = valuesn
NameOfItemB
attribute1 = values1
attribute2 = values2
...
attributen = valuesn

NameOfItemXとattributeXは、明確に定義された既知の名前のセットです。それを合理的なオブジェクトに変える必要があります：

ObjectForA.attribute1 = values1

等

私は以前にこれをしたことを知っています、ただ私がそれをした方法を思い出せません。それは次のように見えました：

for line in textinput:
    if line.find("NameOfItem"):
        ... parse until next one ...

うまくいけば、私が言っていることが理にかなっていて、誰かが助けることができます

score 2 · Accepted Answer

これは、データをネストされた辞書に配置することを除いて、mgilsonの回答に似ています。

from collections import defaultdict
itemname = None
d = defaultdict(dict)
for line in data:
    line = line.rstrip()
    if '=' in line:
        attr, value = line.split('=',1)
        d[itemname][attr] = value
    else:
        itemname = line

score 1 · Accepted Answer

これがあなたが興味を持っているかもしれないpyparsingソリューションです。私はコードをかなりウォークスルーするためにコメントを追加しました。

data = """\
NameOfItemA
attribute1 = values1A
attribute2 = values2A
attributen = valuesnA
NameOfItemB
attribute1 = values1B
attribute2 = values2B
attributen = valuesnB
"""

from pyparsing import Suppress, Word, alphas, alphanums, \
              empty, restOfLine, Dict, OneOrMore, Group

# define some basic elements - suppress the '=' sign because, while
# it is important during the parsing process, it is not an interesting part
# of the results
EQ = Suppress('=')
ident = Word(alphas, alphanums)

# an attribute definition is an identifier, and equals, and whatever is left
# on the line; the empty advances over whitespace so lstrip()'ing the
# values is not necessary
attrDef = ident + EQ + empty + restOfLine

# define a section as a lone ident, followed by one or more attribute 
# definitions (using Dict will allow us to access attributes by name after 
# parsing)
section = ident + Dict(OneOrMore(Group(attrDef)))

# overall grammar is defined as a series of sections - again using Dict to
# give us attribute-name access to each section's attributes
sections = Dict(OneOrMore(Group(section)))

# parse the string, which gives back a pyparsing ParseResults
s = sections.parseString(data)

# get data using dotted attribute notation
print s.NameOfItemA.attribute2

# or access data like it was a nested dict
print s.keys()
for k in s.keys():
    print s[k].items()

プリント：

values2A
['NameOfItemB', 'NameOfItemA']
[('attribute2', 'values2B'), ('attribute1', 'values1B'), ('attributen', 'valuesnB')]
[('attribute2', 'values2A'), ('attribute1', 'values1A'), ('attributen', 'valuesnA')]

score 1 · Accepted Answer

ネストされたdictとしてそれを持っているのはどうですか？

x = {'NameOfItemA': {'attribute1': 'value1', 'attribute2': 'value2'},...}

次に、値を次のように参照できます。

value2 = x['NameOfItemA']['attribute2']

そして、属性を想定すると、値は常に次のような見出しの後に続きますNameOfItemN。

items = {}
for line in textinput:
    if line.find("NameOfItem"):
        headline = line
        inner_dict = {}
        items[headline] = inner_dict
    else:
        attr, val = line.split('=',1)
        items[headline][attr] = val

score 0 · Accepted Answer

どうですか：

class obj(object): pass
items={}
for line in textinput:
    if(line.find("NameOfItem")!=-1):
       current_object=items[line.replace("NameOfItem","")]=obj()
    else:
       attr,val=line.split('=',1)
       setattr(current_object,attr.strip(),val.strip())

もちろん、使用したいクラスがすでにある場合は、基本オブジェクトを省略できます...結局のところ、keys =object_namesおよびvalues=属性（文字列として-あなた）を持つオブジェクトのディクショナリがあります。文字列でない場合は、想定されているタイプに変換する必要があります）

また、この入力ファイル形式はモジュールの形式と非常によく似ていることに注意してください。ConfigParserおそらく、ファイルを読み取って"NameOfItem"行を[item]行に変更し、それをStringIOオブジェクトとしてConfigParserに渡すことができます...

python - 属性の（不適切にフォーマットされた）リストを解析するためのアルゴリズム

4 に答える 4

Related

Reference