avro - Avro スキーマの進化

Question

2 つの質問があります。

同じリーダーを使用して、互換性のある 2 つのスキーマで記述されたレコードを解析することはできSchema V2ますSchema V1か? ここでの答えはノーだと思いますが、イエスの場合、どうすればいいですか?
Schema V1でレコードの書き込みと読み取りを試みましたSchema V2が、次のエラーが発生します。

org.apache.avro.AvroTypeException: foo が見つかりました。foo が必要です

私はavro-1.7.3を使用しました：

   writer = new GenericDatumWriter<GenericData.Record>(SchemaV1);
   reader = new GenericDatumReader<GenericData.Record>(SchemaV2, SchemaV1);

2 つのスキーマの例を次に示します (名前空間も追加しようとしましたが、うまくいきませんでした)。

スキーマ V1:

{
"name": "foo",
"type": "record",
"fields": [{
    "name": "products",
    "type": {
        "type": "array",
        "items": {
            "name": "product",
            "type": "record",
            "fields": [{
                "name": "a1",
                "type": "string"
            }, {
                "name": "a2",
                "type": {"type": "fixed", "name": "a3", "size": 1}
            }, {
                "name": "a4",
                "type": "int"
            }, {
                "name": "a5",
                "type": "int"
            }]
        }
    }
}]
}

スキーマ V2:

{
"name": "foo",
"type": "record",
"fields": [{
    "name": "products",
    "type": {
        "type": "array",
        "items": {
            "name": "product",
            "type": "record",
            "fields": [{
                "name": "a1",
                "type": "string"
            }, {
                "name": "a2",
                "type": {"type": "fixed", "name": "a3", "size": 1}
            }, {
                "name": "a4",
                "type": "int"
            }, {
                "name": "a5",
                "type": "int"
            }]
        }
    }
},
{
            "name": "purchases",
            "type": ["null",{
                    "type": "array",
                    "items": {
                            "name": "purchase",
                            "type": "record",
                            "fields": [{
                                    "name": "a1",
                                    "type": "int"
                            }, {
                                    "name": "a2",
                                    "type": "int"
                            }]
                    }
            }]
}]
}

前もって感謝します。

score 0 · Accepted Answer

その逆もできます。データスキーマ 1 を解析し、スキーマ 2 からデータを書き込むことができることを意味します。書き込み時にデータをファイルに書き込むため、読み取り時にフィールドを指定しなくても問題ありません。しかし、読み取りより少ないフィールドを書き込むと、読み取り時に余分なフィールドが認識されないため、エラーが発生します。

score -1 · Accepted Answer

最善の方法は、Confluent Avro スキーマレジストリのようなスキーマを維持するためのスキーママッピングを用意することです。

重要ポイント:

1.  Unlike Thrift, avro serialized objects do not hold any schema.
2.  As there is no schema stored in the serialized byte array, one has to provide the schema with which it was written.
3.  Confluent Schema Registry provides a service to maintain schema versions.
4.  Confluent provides Cached Schema Client, which checks in cache first before sending the request over the network.
5.  Json Schema present in “avsc” file is different from the schema present in Avro Object.
6.  All Avro objects extends from Generic Record
7.  During Serialization : based on schema of the Avro Object a schema Id is requested from the Confluent Schema Registry.
8.  The schemaId which is a INTEGER is converted to Bytes and prepend to serialized AvroObject.
9.  During Deserialization : First 4 bytes are removed from the ByteArray.  4 bytes are converted back to INTEGER(SchemaId)
10. Schema is requested from the Confluent Schema Registry and using this schema the byteArray is deserialized.

http://bytepadding.com/big-data/spark/avro/avro-serialization-de-serialization-using-confluent-schema-registry/

avro - Avro スキーマの進化

3 に答える 3

Related

Reference