python - Pythonを使用して、通常とは異なる区切り文字を持つテキストファイルを解析する

Question

レガシーシステムをサポートする際に、次の形式でデータを格納するフィールドデータコレクターに直面しています。

# This is a comment <-beacuse it starts at the begining of the file
# This is a comment <- see above
# 1. Item one <- not a comment because it starts with 1.
# Description of Item 1 <- not a comment as it is after a line that starts with a number
data point 1
data point 2
data point etc
3 <-- represents number of data points under Item one

# 2. Item two <-- not a comment
# Description of item 2 <-- not a comment
data point 1
data point ..
data point 100
100
#3. Item three <--- not a comment
# Item three description
0

そのファイルを解析して各アイテムを独自のリストとして含める正しい方法はわかりません。常にではありませんが、データによって 2 つの異なるアイテムの間にランダムなスペースが追加される場合があることに注意してください。

そのようなファイルを解析する正しい方法は何ですか?

score 1 · Accepted Answer

私はこれを3つのステップで行います：

ファイルの先頭からすべてのコメントを削除します
正規表現で分割して、ファイル内の他のすべてのコメントを検索します (正規表現を使用して分割する方法の例については、こちらを参照してください)。
残りの行を解析する

score 1 · Accepted Answer

REGEX を使用して、次のように分割できます。^(?=\# ?\d+\.)

ここで例を説明: http://regex101.com/r/gB3xD1

python - Pythonを使用して、通常とは異なる区切り文字を持つテキスト ファイルを解析する

2 に答える 2

Related

Reference

python - Pythonを使用して、通常とは異なる区切り文字を持つテキストファイルを解析する