登录查看更多内容

Governing Apache Ranger

Santiago Merchán Casado

Presales Manager | Solutions Engineer | Data Architect | Business Developer | BigData | AI | Cloud | Partners

发布日期: 2025年2月5日

+ 关注

We all know the great potential of Apache Ranger:

Authorization: Control ecosystem access to solutions such as Atlas, NiFi, HDFS, YARN, HIVE, IMPALA, KAFKA, SOLR, HBASE, NOX and many more ...
Auditing: Logs all those accesses
Anonymisation: is able to anonymise on the fly columns of a select according to ABAC data attribute or RBAC user role.

But we also know that like all open source solutions, the interface is very oriented to the operation, and logically, little towards the business.

So we are going to use the APIrest it has to present a governance dashboard to help us easily govern who is accessing what, from where (ip and location) and how many times.

Accessing Ranger API

Like the ATLAS APIrest that we already saw in another article, the RANGER api rest is also quite easy to use once you get the right examples.

The first thing we are going to do is to choose the AccessAudit function.

API rest swagger interface from Ranger web interface.

In code, we can create a function to call and retrieve the information. The first thing it returns is metadata about the data it will send and the resulting pages. The amount of data the audit can collect is astonishing.

As we are only interested in the data of non-system users, we will indicate this parameter in the call: excludeServiceUser=true

I differentiate the KOs and the OKs in 2 different call, so I use the AccessResult=0 and =1

Likewise, after the initial load, we will only be interested in the data that we don't have, then we have to use the StartDate parameter to take it into account, as you can see in the getAccessAudit function.

Transforming and Loading data into table.

This part is simpler, but interesting if we want to locate the requests on a map.

Ranger returns the client's ip, but if we want to know its latitude and longitude, we will have to compare it with an external database that has this kind of data associated to IP ranges. I have used in this example the IP2Location LITE database table.

And when you work with IPs it is convenient to convert it to integers so that it is easier to compare between ranges, without overloading the engines by doing text searches.

In python, the library ipaddress helps here

We will now save the data we collect from the API to a table in HIVE or IMPALA using parquet or iceberg as file/table formats. And we can do this with a spark session.

领英推荐

Apache Beam Tutorial

Macrometa 2 年前

Postgres for Everything IRL

Timescale 9 个月前

April 2023 - Iceberg Community News

Tabular (now part of Databricks) 1 年前

Converting pandas to spark DF and writing it to the table.

The Geo located IPs table

I downloaded this table in CSV and loaded it into a table by previously converting the addresses to numbers as we have seen.

Now we can create views that join the tables together to be able to paint the location of the accesses on the map.

The dashboard

I generate some widgets about access:

First audit access timestamp
Last audit access timestamp
OK records from today and yesterday filtering with SQL to control query access easily.
KO records from today and yesterday filtering with SQL to control query access easily
Geo Map to understand external internet IP access.

Most accessed tables to understand usage patterns.
OK and KO Distribution of users, IPs and service types.

You can easily see the user, IP and query or path that generated the OK or the KO.

List of last KO and OK events, etc.
what you want!

The code

As always, here is the code on github

Thank you for your time!

要查看或添加评论，请登录

Santiago Merchán Casado的更多文章

GenAI chat agent for summarizing medical records with standard terminology

2025年1月8日

GenAI chat agent for summarizing medical records with standard terminology

Good morning and happy new year to all! This time, instead of focusing strictly on government, taking advantage of a…

2 条评论
Usando Agentes de IA generativa para hacer resúmenes de historiales clínicos con terminología CIE 10

2025年1月7日

Usando Agentes de IA generativa para hacer resúmenes de historiales clínicos con terminología CIE 10

Buenos días y feliz a?o a todos! Esta vez, en vez de centrarme estrictamente en gobierno, aprovechando una pregunta de…

1 条评论
Automating table creation and table governance with Apache Atlas for clinical data sharing using OMOP CDM

2024年9月13日

Automating table creation and table governance with Apache Atlas for clinical data sharing using OMOP CDM

Good morning, On this occasion, I would like to tell you in this article how, using Open Source solutions such as…
Generando tablas y gobierno automático con Apache Atlas para compartir datos clínicos usando OMOP CDM

2024年9月5日

Generando tablas y gobierno automático con Apache Atlas para compartir datos clínicos usando OMOP CDM

Buenos dias, En esta ocasión, me gustaría comentaros en este artículo cómo, utilizando soluciones Open Source como…
How to make a government dashboard with your Apache Atlas data

2024年9月2日

How to make a government dashboard with your Apache Atlas data

As we saw in the previous article on how to perform data quality testing with Open Source solutions, this time I wanted…

3 条评论
Como hacer un cuadro de mando de gobierno con tus datos de Apache Atlas

2024年8月27日

Como hacer un cuadro de mando de gobierno con tus datos de Apache Atlas

Como bien vimos en el articulo anterior de sobre como realizar test de calidad del dato con soluciones Open Source…
Implementing Data Quality Rules with Great Expectations

2024年5月1日

Implementing Data Quality Rules with Great Expectations

Hello: In this post I would like to tell you how to implement data quality rules on datasets using #GreatExpectations…
Implementando Reglas de Calidad del dato con Great Expectations

2024年5月1日

Implementando Reglas de Calidad del dato con Great Expectations

Hola: En este post me gustaría comentaros como implementar reglas de calidad del dato sobre sets de datos usando…

9 条评论
Sistema de riego inteligente parte 2: Obteniendo las predicciones y otros datos vía Google Cloud Run

2023年7月24日

Sistema de riego inteligente parte 2: Obteniendo las predicciones y otros datos vía Google Cloud Run

Hola de nuevo: Vamos a por esa segunda parte, perdonar el retraso he querido esperar a tener al menos datos de 15 dias…
Smart irrigation system part 1: Obtaining weather data via Azure Functions

2023年7月11日

Smart irrigation system part 1: Obtaining weather data via Azure Functions

Good morning again, I am going to translate the previous article to English so you can see how I am updating the…

1 条评论

See all articles

Governing Apache Ranger

Santiago Merchán Casado

Presales Manager | Solutions Engineer | Data Architect | Business Developer | BigData | AI | Cloud | Partners

Accessing Ranger API

Transforming and Loading data into table.

领英推荐

The Geo located IPs table

The dashboard

The code

Santiago Merchán Casado的更多文章

社区洞察

其他会员也浏览了

Summarizing Recent Wins for Apache Iceberg Table Format

FLaNK-AIM: 13 May 2024

Bulk Insert via python to insert over 4 Million+ rows to MariaDB at localhost [Project-Based]

Understanding the Future of Apache Iceberg Catalogs

Create A Flask App To Use PostgreSQL Database

What is Apache Beam and how does it fit in the data processing ecosystem?

Ten Interesting Things About Apache Cassandra For Developers

Mastering Spark Session Creation and Configuration in Apache Spark

Performing DML Operations on Apache Iceberg ?? Tables in Jupyter Notebook with MinIO

Real-Time OLAP with Apache Pinot and Kafka: Practical Project

Accessing Ranger API

Transforming and Loading data into table.

领英推荐

The Geo located IPs table

The dashboard

The code

Santiago Merchán Casado的更多文章

GenAI chat agent for summarizing medical records with standard terminology

Usando Agentes de IA generativa para hacer resúmenes de historiales clínicos con terminología CIE 10

Automating table creation and table governance with Apache Atlas for clinical data sharing using OMOP CDM

Generando tablas y gobierno automático con Apache Atlas para compartir datos clínicos usando OMOP CDM

How to make a government dashboard with your Apache Atlas data

Como hacer un cuadro de mando de gobierno con tus datos de Apache Atlas

Implementing Data Quality Rules with Great Expectations

Implementando Reglas de Calidad del dato con Great Expectations

Sistema de riego inteligente parte 2: Obteniendo las predicciones y otros datos vía Google Cloud Run

Smart irrigation system part 1: Obtaining weather data via Azure Functions

社区洞察

其他会员也浏览了

Summarizing Recent Wins for Apache Iceberg Table Format

FLaNK-AIM: 13 May 2024

Bulk Insert via python to insert over 4 Million+ rows to MariaDB at localhost [Project-Based]

Understanding the Future of Apache Iceberg Catalogs

Create A Flask App To Use PostgreSQL Database

What is Apache Beam and how does it fit in the data processing ecosystem?

Ten Interesting Things About Apache Cassandra For Developers

Mastering Spark Session Creation and Configuration in Apache Spark

Performing DML Operations on Apache Iceberg ?? Tables in Jupyter Notebook with MinIO

Real-Time OLAP with Apache Pinot and Kafka: Practical Project