python - ストリーミングによる N トリプルの解析

Question

私はしばらくこれについてかなり混乱していましたが、Raptor と Redland Python Extensions を使用して大規模な N-Triples RDF ストア (.nt) を解析する方法をようやく学びました。

一般的な例は、次のようにすることです。

import RDF
parser=RDF.Parser(name="ntriples")
model=RDF.Model()
stream=parser.parse_into_model(model,"file:./mybigfile.nt")
for triple in model:
    print triple.subject, triple.predicate, triple.object

Parse_into_model() はデフォルトでオブジェクトをメモリにロードするため、大きなファイルを解析する場合は、HashStorage をモデルとして使用し、その方法でシリアル化することを検討できます。

しかし、ファイルを読み込んで、それをモデルやそのような複雑なものにロードせずに MongoDB に追加したい場合はどうでしょうか?

score 2 · Accepted Answer

import RDF

parser=RDF.NTriplesParser()

for triple in parser.parse_as_stream("file:./mybigNTfile.nt"):
  print triple.subject, triple.predicate, triple.object

python - ストリーミングによる N トリプルの解析

1 に答える 1

Related

Reference