Avro

From air
Revision as of 14:52, 6 April 2016 by FAURE.ADRIEN (talk | contribs)
Jump to navigation Jump to search

Apache Avro

Avro is a remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services. It is similar to Thrift, but does not require running a code-generation program when a schema changes (unless desired for statically-typed languages).

Languages with APIs

Though theoretically any language could use Avro, the following languages have APIs written for them.

  • Java
  • Scala
  • C Sharp
  • C
  • C++
  • Python
  • Ruby

Features

Avro provides:

  • Rich data structures.
  • A compact, fast, binary data format.
  • A container file, to store persistent data.
  • Remote procedure call (RPC).
  • Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.