python - XML データを辞書に編成する

Question

XML データから自分のデータを辞書形式に整理しようとしています。これは、モンテカルロシミュレーションの実行に使用されます。

XML のいくつかのエントリがどのように見えるかの例を次に示します。

<retirement>
    <item>
        <low>-0.34</low>
        <high>-0.32</high>
        <freq>0.0294117647058824</freq>
        <variable>stock</variable>
        <type>historic</type>
    </item>
    <item>
        <low>-0.32</low>
        <high>-0.29</high>
        <freq>0</freq>
        <variable>stock</variable>
        <type>historic</type>
    </item>
</retirement>

私の現在のデータセットには 2 つの変数しかなく、型は 3 つのうちの 1 つまたは可能な 4 つの離散型にすることができます。2 つの変数をハードコーディングすることは問題ではありませんが、より多くの変数を持つデータの処理を開始し、このプロセスを自動化したいと考えています。私の目標は、この XML データをディクショナリに自動的にインポートして、配列のタイトルと変数をハードコーディングすることなく、後でさらに操作できるようにすることです。

ここに私が持っているものがあります:

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse('xmlfile')

# Create iterable item list
Items = tree.findall('item')

# Create Master Dictionary
masterDictionary = {}

# Assign variables to dictionary
for Item in Items:
    thisKey = Item.find('variable').text
    if thisKey in masterDictionary == False:
        masterDictionary[thisKey] = []
    else:
        pass

thisList = masterDictionary[thisKey]
newDataPoint = DataPoint(float(Item.find('low').text), float(Item.find('high').text), float(Item.find('freq').text))
thisSublist.append(newDataPoint)

KeyError @ thisList = masterDictionary[thisKey] を取得しています

また、xml の他の要素を処理するクラスを作成しようとしています。

# Define a class for each data point that contains low, hi and freq attributes
class DataPoint:
 def __init__(self, low, high, freq):
  self.low = low
  self.high = high
  self.freq = freq

次に、次のような値で値を確認できますか?

masterDictionary['stock'] [0].freq

ありとあらゆる助けをいただければ幸いです

アップデート

助けてくれてありがとうジョン。インデントの問題は私のずさんさです。スタックに投稿するのは初めてで、コピー/貼り付けがうまくいきませんでした。else: の後の部分は、実際には for ループの一部になるようにインデントされており、コードではクラスが 4 つのスペースでインデントされています。大文字の規則を念頭に置いておきます。あなたの提案は実際に機能し、コマンドを使用するようになりました：

print masterDictionary.keys()
print masterDictionary['stock'][0].low

収量:

['inflation', 'stock']
-0.34

これらは実際に私の 2 つの変数であり、値は上部にリストされている xml と同期します。

更新 2

さて、私はこれを理解したと思っていましたが、再び不注意でした.問題を完全に修正していなかったことが判明しました. 前のソリューションでは、すべてのデータを 2 つのディクショナリキーに書き込むことになり、2 つの異なるディクショナリキーに割り当てられたすべてのデータの 2 つの等しいリストが作成されました。アイデアは、XML から一致する辞書キーに個別のデータセットを割り当てることです。現在のコードは次のとおりです。

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse(xml file)

# Create iterable item list
items = tree.findall('item')

# Create class for historic variables
class DataPoint:
    def __init__(self, low, high, freq):
        self.low = low
        self.high = high
        self.freq = freq

# Create Master Dictionary and variable list for historic variables
masterDictionary = {}
thisList = []

# Loop to assign variables as dictionary keys and associate their values with them
for item in items:
    thisKey = item.find('variable').text 
    masterDictionary[thisKey] = thisList
    if thisKey not in masterDictionary:
        masterDictionary[thisKey] = []
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text))
    thisList.append(newDataPoint)

入力すると：

print masterDictionary['stock'][5].low
print masterDictionary['inflation'][5].low
print len(masterDictionary['stock'])
print len(masterDictionary['inflation'])

結果は両方のキー ('stock' と 'inflation') で同じです。

-.22
-.22
56
56

XML ファイルには、stock タグが付けられた 27 個のアイテムと、inflation でタグ付けされた 29 個のアイテムがあります。辞書キーに割り当てられた各リストをループ内の特定のデータのみをプルするにはどうすればよいですか?

更新 3

2 つのループで動作するようですが、1 つのループで動作しない理由と方法がわかりません。私はこれを誤って管理しました：

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse(xml file)

# Create iterable item list
items = tree.findall('item')

# Create class for historic variables
class DataPoint:
    def __init__(self, low, high, freq):
        self.low = low
        self.high = high
        self.freq = freq

# Create Master Dictionary and variable list for historic variables
masterDictionary = {}

# Loop to assign variables as dictionary keys and associate their values with them
for item in items:
    thisKey = item.find('variable').text
    thisList = []
    masterDictionary[thisKey] = thisList

for item in items:
    thisKey = item.find('variable').text
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text))
    masterDictionary[thisKey].append(newDataPoint)

1回のループでそれを実現するために多数の順列を試しましたが、うまくいきませんでした。両方のキーにリストされているすべてのデータを取得できます-すべてのデータの同一の配列(あまり役に立ちません)、またはデータは両方のキーの2つの異なる配列に正しくソートされますが、最後の単一のデータエントリのみです(ループはそれ自体を上書きします)毎回、配列内のエントリは 1 つだけになります)。

score 2 · Accepted Answer

(不要な) の後に深刻なインデントの問題がありelse: passます。それを修正して、もう一度やり直してください。サンプル入力データで問題が発生しますか? 他のデータ？ループは初めてですか？問題の原因となっている値は何ですかthisKey[ヒント: KeyError エラーメッセージで報告されています]? エラーが発生する直前の masterDictionary の内容は何ですか [ヒント:printコードの周りにいくつかのステートメントを散りばめます]?

あなたの問題に関係のないその他のコメント:

...のif thisKey in masterDictionary == False:使用を検討する代わりに、またはほとんど常に冗長であり、および/または「コードの匂い」が少しあります。if thisKey not in masterDictionary:TrueFalse

Python の慣例では、名前の先頭が大文字 ( などItem) の名前をクラスに予約します。

インデントレベルごとに 1 つのスペースのみを使用すると、コードがほとんど読みにくくなり、非常に推奨されなくなります。常に 4 を使用してください (正当な理由がある場合を除きますが、私は聞いたことがありません)。

更新私は間違っていました：thisKey in masterDictionary == False思ったより悪いです。は関係演算子であるため、( のように)in連鎖評価が使用されるため、これは常に False と評価されるため、辞書は更新されません。修正は私が提案したとおりです：使用a <= b < c(thisKey in masterDictionary) and (masterDictionary == False)if thisKey not in masterDictionary:

また、thisList(初期化thisSublistされているが使用されていない) は (使用されているが初期化されていない) であるように見えます。

score 0 · Accepted Answer

変化する：

if thisKey in masterDictionary == False:

に

if thisKey not in masterDictionary:

それがあなたがそのエラーを受け取っていた理由のようです。また、追加する前に、「thisSublist」に何かを割り当てる必要があります。試す：

thisSublist = []
thisSublist.append(newDataPoint)

score -1 · Accepted Answer

for ループ内の if ステートメントにエラーがあります。それ以外の

if thisKey in masterDictionary == False:

書きます

if (thisKey in masterDictionary) == False:

元のコードの残りの部分を考えると、次のようにデータにアクセスできます。

masterDictionary['stock'][0].freq

John Machin は、スタイルと匂いに関していくつかの有効な点を指摘しています (そして、彼が提案した変更について考えるべきです) が、それらは時間と経験によってもたらされます。

python - XML データを辞書に編成する

3 に答える 3

Related

Reference