Apache Avro

avro.apache.org

4

About this website

Apache Avro is a data serialization system that provides rich data structures, a compact binary format, and container files for persistent data storage. Originally created within the Apache Hadoop project by Doug Cutting, Avro has become the leading serialization format for record data and the first choice for streaming data pipelines in big data ecosystems. Avro schemas are defined using JSON and describe the structure of the data being serialized, including field names, types (primitive types like string, int, long, float, double, boolean, bytes, and complex types like record, enum, array, map, union), and default values. One of Avro's key strengths is its excellent schema evolution support: fields can be added or removed with default values, enabling backward and forward compatibility between producers and consumers without requiring coordinated deployments. The schema is always stored alongside the serialized data in the container file format, making the data self-describing. Avro supports remote procedure call (RPC) through Avro RPC, which embeds the schema in the protocol handshake for efficient type-safe communication. The format has implementations for JVM languages (Java, Kotlin, Scala), Python, C, C++, C#, PHP, Ruby, Rust, JavaScript, and Perl. Avro is widely used with Apache Kafka for event streaming (via Confluent Schema Registry), Apache Hadoop for MapReduce data interchange, Apache Parquet and Apache ORC for columnar storage, and in data warehouse pipelines on AWS, GCP, and Azure. The current stable version is 1.12.0, released under the Apache License 2.0.

Tags & Categories

Statistics

4

Views

0

Clicks

0

Like

0

Dislike

Comments

Log In to post a comment

No comments yet. Be the first!

Apache Avro

Leaving SiteNav

About this website

Tags & Categories

Categories

Tags

Statistics

Comments

Choose a folder