python - PythonでJSONファイルを等しい/小さい部分に分割する

Question

私は現在、Twitter の投稿に感情分析を使用するプロジェクトに取り組んでいます。Sentiment140でツイートを分類しています。このツールを使用すると、1 日あたり最大 1,000,000 のツイートを分類でき、約 750,000 のツイートを収集しました。それでいいはずです。唯一の問題は、一度に最大 15,000 件のツイートを JSON 一括分類に送信できることです。

私のコード全体がセットアップされ、実行されています。唯一の問題は、JSON ファイルに 750,000 件のツイートがすべて含まれていることです。

したがって、私の質問: JSON を同じ構造の小さなファイルに分割する最良の方法は何ですか? 私はPythonでこれを行うことを好みます。

ファイルを反復処理することを考えました。しかし、たとえば 5,000 要素の後に新しいファイルを作成するようにコードで指定するにはどうすればよいでしょうか?

最も合理的なアプローチが何であるかについて、いくつかのヒントを得たいと思います。ありがとうございました！

編集:これは私が現時点で持っているコードです。

import itertools
import json
from itertools import izip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

# Open JSON file
values = open('Tweets.json').read()
#print values

# Adjust formatting of JSON file
values = values.replace('\n', '')    # do your cleanup here
#print values

v = values.encode('utf-8')
#print v

# Load JSON file
v = json.loads(v)
print type(v)

for i, group in enumerate(grouper(v, 5000)):
    with open('outputbatch_{}.json'.format(i), 'w') as outputfile:
        json.dump(list(group), outputfile)

出力は次のとおりです。

["data", null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, ...]

「outputbatch_0.json」というファイルで

編集 2: これは JSON の構造です。

{
"data": [
{
"text": "So has @MissJia already discussed this Kelly Rowland Dirty Laundry song? I ain't trying to go all through her timelime...",
"id": "1"
},
{
"text": "RT @UrbanBelleMag: While everyone waits for Kelly Rowland to name her abusive ex, don't hold your breath. But she does say he's changed: ht\u00e2\u20ac\u00a6",
"id": "2"
},
{
"text": "@Iknowimbetter naw if its weak which I dont think it will be im not gonna want to buy and up buying Kanye or even Kelly Rowland album lol",
"id": "3"}
]
}

python - PythonでJSONファイルを等しい/小さい部分に分割する

2 に答える 2

Related

Reference