Avro

Avro

Avro is an open source project that provides data serialization and data exchange services for Apache Hadoop.

Avro is a language-neutral data serialization system. It was developed by Doug Cutting, the father of Hadoop. Since Hadoop writable classes lack language portability, Avro becomes quite helpful, as it deals with data formats that can be processed by multiple languages. Avro is a preferred tool to serialize data in Hadoop.

Avro has a schema-based system. A language-independent schema is associated with its read and write operations. Avro serializes the data which has a built-in schema. Avro serializes the data into a compact binary format, which can be deserialized by any application.

Avro uses JSON format to declare the data structures. Presently, it supports languages such as Java, C, C++, C#, Python, and Ruby.

vro depends heavily on its schema. It allows every data to be written with no prior knowledge of the schema. It serializes fast and the resulting serialized data is lesser in size. Schema is stored along with the Avro data in a file for any further processing.

In RPC, the client and the server exchange schemas during the connection. This exchange helps in the communication between same named fields, missing fields, extra fields, etc.

Avro schemas are defined with JSON that simplifies its implementation in languages with JSON libraries.

Like Avro, there are other serialization mechanisms in Hadoop such as Sequence Files, Protocol Buffers, and Thrift.

要查看或添加评论,请登录

Dipti Goyal的更多文章

  • Oracle Essbase

    Oracle Essbase

    Oracle Essbase is a business analytics solution and multidimensional database management system (MDBMS) that provides a…

  • BigQuery

    BigQuery

    Google BigQuery is a cloud-based big data analytics web service for processing very large read-only data sets. BigQuery…

  • Gap Analysis

    Gap Analysis

    A gap analysis is a method for comparing a business's current performance to its desired performance. It's a strategic…

  • Tableau

    Tableau

    Tableau is a visual analytics platform that empowers users to explore, visualize, and analyze data to gain insights and…

  • Jira

    Jira

    Jira is a project management and issue tracking tool developed by Atlassian, used by teams to plan, track, release, and…

  • Natural Language Processing

    Natural Language Processing

    Natural language processing (NLP) is the ability of a computer program to understand human language as it's spoken and…

  • Risk Weighted Assets

    Risk Weighted Assets

    RWA can refer to risk-weighted assets or resident welfare association. Risk-weighted assets RWA is a banking term that…

  • Chargeback Analysis

    Chargeback Analysis

    Chargeback analysis is the process of examining data related to customer disputes on credit card transactions…

  • Solution Architecture

    Solution Architecture

    Solution architecture is a systematic method for designing IT solutions that meet business needs. It involves planning…

  • DAX

    DAX

    Data Analysis Expressions (DAX) is a formula expression language used in Analysis Services, Power BI, and Power Pivot…

社区洞察

其他会员也浏览了