python - 次の形式のPythonを使用してファイルを解析するための最良の方法（エラープルーフ/フールプルーフ）は何ですか？

Question

########################################
# some comment
# other comment
########################################

block1 {
    value=data
    some_value=some other kind of data
    othervalue=032423432
    }

block2 {
    value=data
    some_value=some other kind of data
    othervalue=032423432
    }

score 6 · Accepted Answer

最善の方法は、JSON などの既存の形式を使用することです。

フォーマットのパーサーの例を次に示します。

from lepl import (AnyBut, Digit, Drop, Eos, Integer, Letter,
                  NON_GREEDY, Regexp, Space, Separator, Word)

# EBNF
# name = ( letter | "_" ) , { letter | "_" | digit } ;
name = Word(Letter() | '_',
            Letter() | '_' | Digit())
# words = word , space+ , word , { space+ , word } ;
# two or more space-separated words (non-greedy to allow comment at the end)
words = Word()[2::NON_GREEDY, ~Space()[1:]] > list
# value = integer | word | words  ;
value = (Integer() >> int) | Word() | words
# comment = "#" , { all characters - "\n" } , ( "\n" | EOF ) ;
comment = '#' & AnyBut('\n')[:] & ('\n' | Eos())

with Separator(~Regexp(r'\s*')):
    # statement = name , "=" , value ;
    statement = name & Drop('=') & value > tuple
    # suite     = "{" , { comment | statement } , "}" ;
    suite     = Drop('{') & (~comment | statement)[:] & Drop('}') > dict
    # block     = name , suite ;
    block     = name & suite > tuple
    # config    = { comment | block } ;
    config    = (~comment | block)[:] & Eos() > dict

from pprint import pprint

pprint(config.parse(open('input.cfg').read()))

出力：

[{'block1': {'othervalue': 32423432,
             'some_value': ['some', 'other', 'kind', 'of', 'data'],
             'value': 'data'},
  'block2': {'othervalue': 32423432,
             'some_value': ['some', 'other', 'kind', 'of', 'data'],
             'value': 'data'}}]

score 4 · Accepted Answer

まあ、データはかなり規則的に見えます。したがって、次のようなことができます（テストされていません）：

class Block(object):
    def __init__(self, name):
        self.name = name

infile = open(...)  # insert filename here
current = None
blocks = []

for line in infile:
    if line.lstrip().startswith('#'):
        continue
    elif line.rstrip().endswith('{'):
        current = Block(line.split()[0])
    elif '=' in line:
        attr, value = line.strip().split('=')
        try:
            value = int(value)
        except ValueError:
            pass
        setattr(current, attr, value)
    elif line.rstrip().endswith('}'):
        blocks.append(current)

block.name結果は、名前 ( 'block1'、'block2'など) と他の属性がデータのキーに対応するブロックインスタンスのリストになります。したがって、blocks[0].value「データ」などになります。これは、文字列と整数のみを値として処理することに注意してください。

(キーに「名前」を含めることができる場合、ここに明らかなバグがあります。これが発生する可能性がある場合は、または何かに変更self.nameすることをお勧めします)self._name

チッ！

score 3 · Accepted Answer

解析ではなく、テキスト処理を意味し、入力データが本当に規則的である場合は、ジョンのソリューションを使用してください。解析が本当に必要な場合 (取得するデータにもう少し複雑なルールがある場合など)、解析する必要があるデータの量に応じて、 pyparsing またはsimpleparseを使用します。私はそれらの両方を試しましたが、実際には pyparsing は遅すぎました。

score 2 · Accepted Answer

2

pyparsingのようなものを調べるかもしれません。

于 2009-01-29T22:15:12.523 に答える

python - 次の形式のPythonを使用してファイルを解析するための最良の方法（エラープルーフ/フールプルーフ）は何ですか？

5 に答える 5

Related

Reference