python - Python でネストされたブラケット間のテキストを RegEx と一致させる

Question

行が次のような大きな CSV ファイルがあります。

id_85,
{
    "link": "some link",
    "icon": "hello.gif",
    "name": "Wall Photos",
    "comments": {
        "count": 0
    },
    "updated_time": "2012-03-12",
    "object_id": "400",
    "is_published": true,
    "properties": [
        {
            "text": "University",
            "name": "By",
            "href": "some link"
        }
    ],
    "from": {
        "id": "7778",
        "name": "Let"
    },
    "message": "Hello World! :D",
    "id": "id_85",
    "created_time": "2012-03-12",
    "to": {
        "data": [
            {
                "id": "100",
                "name": "March"
            }
        ]
    },
    "message_tags": {
        "0": [
            {
                "id": "100",
                "type": "user",
                "name": "Marcelo",
                "length": 7,
                "offset": 0
            }
        ]
    },
    "type": "photo",
    "caption": "Hello world!"
}

最初と最後の中括弧の間のjson部分を取得しようとしています。

以下はこれまでの私のpython正規表現コードです

import re 
str = "id_85,{"link": "some link", "icon": "hello.gif", "name": "Wall Photos", "comments": {"count": 0}, "updated_time": "2012-03-12", "object_id": "400", "is_published": true, "properties": [{"text": "University", "name": "By", "href": "some link"}], "from": {"id": "777", "name": "Let"}, "message": "Hello World! :D", "id": "id_85", "created_time": "2012-03-12", "to": {"data": [{"id": "100", "name": "March"}]}, "message_tags": {"0": [{"id": "100", "type": "user", "name": "March", "length": 7, "offset": 0}]}, "type": "photo", "caption": "Hello world!"} "
m = re.match(r'.*,({.*}$)', str)
if m:
     print m.group(1)

{ ... } のように、最初と最後の中括弧を取らない場合があります。最初と最後の中括弧の間のテキストのみが含まれ、他のテキストは含まれないようにするにはどうすればよいですか?

目的の出力は次のようになります。

{"link": "いくつかのリンク", "icon": "hello.gif", "name": "Wall Photos", "comments": {"count": 0}, "updated_time": "2012-03- 12", "object_id": "400", "is_published": true, "properties": [{"text": "University", "name": "By", "href": "some link"}], "from": {"id": "777", "name": "Let"}, "message": "Hello World! :D", "id": "id_85", "created_time": "2012-03 -12", "to": {"data": [{"id": "100", "name": "3月"}]}, "message_tags": {"0": [{"id": " 100","type": "user", "name": "March", "length": 7, "offset": 0}]}, "type": "photo", "caption": "Hello world!"}

ありがとう！

score 0 · Accepted Answer

(最初に投稿されたように) CSV の各行に 1 つの JSON 要素があると仮定すると、

re.match(r'^[^{]*({.*})[^}]*$',str).group(1)

トリックを行う必要があります。つまり{、最初のものを見つけるまでa でないものはすべて破棄し、その後に}他}のがない a をヒットするまですべてをグループに入れます。

score 0 · Accepted Answer

これは、最初のコンマの後の json 部分全体と一致します。これがあなたが望んでいたものかどうかはわかりません。望ましい出力の例が役立つでしょう。

re.match(r'[^,]*,(.*)', s).group(1)

python - Python でネストされたブラケット間のテキストを RegEx と一致させる

3 に答える 3

Related

Reference