python - Python で非常に複雑な JSON ファイルから抽出する

Question

Python を使用して、非常に複雑な JSON ファイルから情報を取得しようとしています。以下は、ファイルの 1 つのオブジェクトです。

{
"__metadata": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)", "etag": "W/\"2\"", "type": "Microsoft.SharePoint.DataService.PostsItem"
}, "Title": "Term 2 Round 2 draws", "Body": "<div class=\"ExternalClass0BC1BCA4D3EE45A4A1F34086034FE827\"><p>\u200bAs there is no Gonzagan this week the following Senior Sport information has been provided here.\r\n\t    </p>\r\n<ul><li><a target=\"_blank\" href=\"/Intranet/students/news_resources/2011/Term2/Knox _wet_weather.pdf\">Knox _wet_weather</a> Cancellations, please see <a target=\"_blank\" href=\"http://www.twitter.com/SACWetWeather\">twitter page</a> for further news.</li>\r\n<li><a target=\"_blank\" href=\"/Intranet/students/news_resources/2011/Term2/2011_Football_round_2.pdf\">2011 Football draw Round 2</a></li>\r\n<li><a target=\"_blank\" href=\"/Intranet/students/news_resources/2011/Term2/2011_Rugby_round_2.pdf\">2011 Rugby draw Round 2</a></li></ul>\r\n<p></p></div>", "Category": {
"__deferred": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)/Category"
}
}, "Published": "\/Date(1308342960000)\/", "ContentTypeID": "0x0110001F9F7104FDD3054AAB40D8561196E09E", "ApproverComments": null, "Comments": {
"__deferred": {
"uri": "/_vti_bin/ListData.svc/Posts(4)/Comments"
}
}, "CommentsId": 0, "ApprovalStatus": "0", "Id": 4, "ContentType": "Post", "Modified": "\/Date(1309122092000)\/", "Created": "\/Date(1309120597000)\/", "CreatedBy": {
"__deferred": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)/CreatedBy"
}
}, "CreatedById": 1, "ModifiedBy": {
"__deferred": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)/ModifiedBy"
}
}, "ModifiedById": 1, "Owshiddenversion": 2, "Version": "1.0", "Path": "/Students/news/Lists/Posts"
},

これを編集することに頭を悩ませることはできません。それをpython辞書に変換すると、属性の順序がごちゃごちゃになり、あるオブジェクトの開始点と別のオブジェクトの開始点を見つけることができなくなります。「タイトル」、「本文」、および「公開済み」のキーと値だけを抽出する最良の方法は何ですか?また、複数のオブジェクトに対してどのように行うのですか?

score 1 · Accepted Answer

import json

obj = json.loads(json_input)

for record in obj:
    print obj["title"]
    print obj["body"]
    print obj["published"]

json_input が上記のスニペットであるか、文字列形式であるか、ファイルを介して既に読み込まれていると仮定します。また、上記のスニペットは、あなたの質問に基づいたコレクションであると推測したことに注意してください。

アップデート

例に基づいて、最初に投稿されたスニペットには存在しなかった別のレイヤーがあります。

ループを次のように変更します。

for record in obj["d"]["results"]:
    ...

score 1 · Accepted Answer

メインの JSON オブジェクトは、これらのオブジェクトの配列であると想定しています。あなたが求めている情報を印刷する方法は次のとおりです。

import json

main_array = json.load('my_json_file.json')

for sub_object in main_array:
    print "Title: {}\nBody: {}\nPublished: {}\n".format(
        sub_object['Title'], sub_object['Body'], sub_object['Published']
    )

python - Python で非常に複雑な JSON ファイルから抽出する

2 に答える 2

Related

Reference