Apache Avro
avro.apache.org
4
Leaving SiteNav
External Link Disclaimer
You are about to visit avro.apache.org. This website is not operated by us. We are not responsible for its content or privacy practices.
About this website
Apache Avro is a data serialization system that provides rich data structures, a compact binary format, and container files for persistent data storage. Originally created within the Apache Hadoop project by Doug Cutting, Avro has become the leading serialization format for record data and the first choice for streaming data pipelines in big data ecosystems. Avro schemas are defined using JSON and describe the structure of the data being serialized, including field names, types (primitive types like string, int, long, float, double, boolean, bytes, and complex types like record, enum, array, map, union), and default values. One of Avro's key strengths is its excellent schema evolution support: fields can be added or removed with default values, enabling backward and forward compatibility between producers and consumers without requiring coordinated deployments. The schema is always stored alongside the serialized data in the container file format, making the data self-describing. Avro supports remote procedure call (RPC) through Avro RPC, which embeds the schema in the protocol handshake for efficient type-safe communication. The format has implementations for JVM languages (Java, Kotlin, Scala), Python, C, C++, C#, PHP, Ruby, Rust, JavaScript, and Perl. Avro is widely used with Apache Kafka for event streaming (via Confluent Schema Registry), Apache Hadoop for MapReduce data interchange, Apache Parquet and Apache ORC for columnar storage, and in data warehouse pipelines on AWS, GCP, and Azure. The current stable version is 1.12.0, released under the Apache License 2.0.
Statistics
4
Views
0
Clicks
0
Like
0
Dislike