python - Python ijson - 解析エラー: 末尾のゴミ // bz2.decompress()

Question

json を ijson で解析中にエラーが発生しました。

背景: '.bz2' 形式で圧縮された一連の (約 - 1000) の Twitter データの大きなファイルがあります。pd.DataFrameさらに分析するために、ファイルから要素を取得する必要があります。取得する必要があるキーを特定しました。慎重にツイッターのデータをアップしています。

bz2.decompress試行：次のコードを使用してファイルを解凍できました：

## Code in loop specific for decompressing and parsing - 

with open(file, 'rb') as source:
                # Decompress the file
                json_r = bz2.decompress(source.read())
                json_decom =  json_r.decode('utf-8') # decompresses one file at a time rather than a stream
                
                # Parse the JSON with ijson 
                parser = ijson.parse(json_decom)
                for prefix, event, value in parser:
                    # Print selected items as part of testing
                    if prefix=="created_at":
                        print(value)
                    if prefix=="text":
                        print(value)
                    if prefix=="user.id_str":
                        print(value)

これにより、次のエラーが発生します。

IncompleteJSONError: parse error: trailing garbage
          estamp_ms":"1609466366680"}  {"created_at":"Fri Jan 01 01:59
                     (right here) ------^

2つのこと：

私の解凍方法は正しく、ijson が解析する正しいタイプのファイルを提供していますか (ijson はバイトと str の両方を取ります)?
JSONエラーですか？// JSON エラーの場合、次のファイルに移動するエラーハンドラを開発することは可能ですか?

どんな援助でも大歓迎です。

ありがとう、ジェームズ

python - Python ijson - 解析エラー: 末尾のゴミ // bz2.decompress()

1 に答える 1

Related

Reference