python - Mongo ドキュメントのエントリを CSV に保存 + ISODate の書式設定

Question

「hello」というmongoコレクションにデータがあります。ドキュメントは次のようになります。

{ 
name: ..., 
size: ..., 
timestamp: ISODate("2013-01-09T21:04:12Z"), 
data: { text:..., place:...},
other: ...
}

各ドキュメントのタイムスタンプとテキストを CSV ファイルにエクスポートしたいと思います。最初の列はタイムスタンプ、2 番目の列はテキストです。

ドキュメントにタイムスタンプとテキストのみが含まれる新しいコレクション (hello2) を作成しようとしました。

data = db.hello
for i in data:
    try:
        connection.me.hello2.insert(i["data"]["text"], i["timestamp"])
    except:
        print "Unable", sys.exc_info()

次に、mongoexport を使用したいと思いました。

mongoexport --db me --collection hello2 --csv --out /Dropbox/me/hello2.csv

しかし、これは機能しておらず、続行する方法がわかりません。

PS: ISODate("2013-01-09T21:04:12Z") の代わりに、ISODate の時刻のみ、つまり 21:04:12 のみを CSV ファイルに保存したいと思います。

ご協力ありがとうございました。

score 2 · Accepted Answer

データコレクションから直接エクスポートできます。一時的なコレクションは必要ありません。

for r in db.hello.find(fields=['text', 'timestamp']):
     print '"%s","%s"' % (r['text'], r['timestamp'].strftime('%H:%M:%S'))

またはファイルに書き込むには：

with open(output, 'w') as fp:
   for r in db.hello.find(fields=['text', 'timestamp']):
       print >>fp, '"%s","%s"' % (r['text'], r['timestamp'].strftime('%H:%M:%S'))

重複を除外して最新のものだけを印刷するには、プロセスを 2 つのステップに分割する必要があります。まず、辞書にデータを蓄積します。

recs = {}
for r in d.foo.find(fields=['data', 'timestamp']):
    text, time = r['data']['text'], r['timestamp']
    if text not in recs or recs[text] < time:
        recs[text] = time

次に、辞書の内容を出力します。

for text, time in recs.items():
    print '"%s","%s"' % (text, time.strftime('%H:%M:%S'))

python - Mongo ドキュメントのエントリを CSV に保存 + ISODate の書式設定

1 に答える 1

Related

Reference