Data Governance Using Apache Atlas

In the present scenario, enterprises have data on the network, on the cloud, and on the endpoint. Thus enabling governance on data is a critical step to understand the sources governing data, last update on the data, classification of data, the relationship and linkage between data and data sources.

Apache Atlas helps in providing the ability to analyze the metadata and then take actions and apply appropriate policies as needed.

What is Apache Atlas?

“Atlas is a scalable and extensible set of core foundational governance services – enabling enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the whole enterprise data ecosystem. “

Apache Atlas is a data governance tool for data governance and metadata management on enterprise Hadoop clusters. It is one stop solution for gathering, processing and maintaining metadata.

Features of Atlas

Centralized Metadata

Atlas provides the ability to define new metadata types and also facilitates easy exchange of metadata by enabling any metadata consumer to share a common metadata store.

Data Classification

Atlas provides the ability to dynamically create classifications- like PII, EXPIRES_ON, DATA_QUALITY, SENSITIVE. Classifications can include attributes – like expiry_date attribute in EXPIRES_ON classification

Lineage

It provides an Intuitive UI to view the lineage of data as it moves through various processes as well as a REST APIs to access and update lineage.

Terms and Components used in Apache Atlas

Types:

A ‘Type’ in Atlas is a definition of how a particular type of metadata objects are stored and accessed. It represents one or a collection of attributes that define the properties for the metadata object. Some already available types are:

  • hive_table
  • jms_topic
  • avro_collection
  • avro_schema
  • storm_bolt etc.

Entities:

An ‘entity’ in Atlas is a specific value or instance of an Entity ‘type’ and thus represents a specific metadata object in the real world.

Type System:

It allows users to define and manage types and entities. All new types of metadata in Atlas are modeled using types and represented as entities.

Graph Engine:

Internally, Atlas persists metadata objects it manages using a Graph model. Graph engine component is responsible for translating between types and entities of the Atlas type system. Atlas uses the JanusGraph to store the metadata objects.

Ingest / Export:

The Ingest component allows metadata to be added to Atlas. Similarly, the Export component exposes metadata changes detected by Atlas to be raised as events. Consumers can consume these change events to react to metadata changes in real time.

Integration with Apache Atlas

Users can manage metadata in Atlas and create new Metadata models using these two methods:

Rest API

All functionality of Atlas is exposed to end users via a REST API that allows types and entities to be created, updated and deleted.

Messaging

In addition to the API, users can choose to integrate with Atlas using a messaging interface that is based on Kafka. Atlas uses Apache Kafka as a notification server for communication between hooks and downstream consumers of metadata notification events. Events are written by the hooks and Atlas to different Kafka topics.


#Data Governance #Metadata Management



Netanel Stern

CEO and security engineer

6 个月

???? ??? ?? ?? ?????? ??????? ??? ???? ???? ????? ???? ?????? ???: https://chat.whatsapp.com/HWWA9nLQYhW9DH97x227hJ

回复

要查看或添加评论,请登录

Krishnapriyan Sridharan的更多文章

  • Hyperledger Fabric

    Hyperledger Fabric

    What is Blockchain Blockchain is a revolutionary technology with the potential to disrupt the way things are currently…

    1 条评论
  • Multi-Cloud Computing

    Multi-Cloud Computing

    What is Cloud Computing? 2. Services on Cloud 3.

    1 条评论
  • AI in Healthcare

    AI in Healthcare

    What is Artificial Intelligence Artificial Intelligence is defined as the science and engineering of creating…

    1 条评论
  • SEVEN LEADERSHIP PRINCIPLES TO LEARN FROM AN EAGLE....

    SEVEN LEADERSHIP PRINCIPLES TO LEARN FROM AN EAGLE....

    SEVEN LEADERSHIP PRINCIPLES TO LEARN FROM AN EAGLE..

  • Business Architecture - Overview

    Business Architecture - Overview

    Business Architecture : Defined Business Architecture is “A blueprint of the enterprise that provides a common…

  • Machine Learning - Overview

    Machine Learning - Overview

    What is Learning? Learning is “Any process by which the system improves performance Learning denotes changes in the…

  • Four-letter words that maximize employee potential

    Four-letter words that maximize employee potential

    The four-letter words we hear so often at work include: ?can't ?quit ?lose ?envy ?hate These are words that crush…

  • Technology in RPA

    Technology in RPA

    Robotic process automation technology has the potential to make business processes more efficient, boost productivity…

  • Robotic process automation

    Robotic process automation

    What is RPA? Robotic process automation technology - defined in simple terms as software that automates other software…

  • Microservices

    Microservices

    Microservices is a specialisation of an implementation approach for service-oriented architectures (SOA) used to build…

社区洞察

其他会员也浏览了