Governing Apache Ranger
Santiago Merchán Casado
Presales Manager | Solutions Engineer | Data Architect | Business Developer | BigData | AI | Cloud | Partners
We all know the great potential of Apache Ranger:
But we also know that like all open source solutions, the interface is very oriented to the operation, and logically, little towards the business.
So we are going to use the APIrest it has to present a governance dashboard to help us easily govern who is accessing what, from where (ip and location) and how many times.
Accessing Ranger API
Like the ATLAS APIrest that we already saw in another article, the RANGER api rest is also quite easy to use once you get the right examples.
The first thing we are going to do is to choose the AccessAudit function.
In code, we can create a function to call and retrieve the information. The first thing it returns is metadata about the data it will send and the resulting pages. The amount of data the audit can collect is astonishing.
As we are only interested in the data of non-system users, we will indicate this parameter in the call: excludeServiceUser=true
I differentiate the KOs and the OKs in 2 different call, so I use the AccessResult=0 and =1
Likewise, after the initial load, we will only be interested in the data that we don't have, then we have to use the StartDate parameter to take it into account, as you can see in the getAccessAudit function.
Transforming and Loading data into table.
This part is simpler, but interesting if we want to locate the requests on a map.
Ranger returns the client's ip, but if we want to know its latitude and longitude, we will have to compare it with an external database that has this kind of data associated to IP ranges. I have used in this example the IP2Location LITE database table.
And when you work with IPs it is convenient to convert it to integers so that it is easier to compare between ranges, without overloading the engines by doing text searches.
We will now save the data we collect from the API to a table in HIVE or IMPALA using parquet or iceberg as file/table formats. And we can do this with a spark session.
The Geo located IPs table
I downloaded this table in CSV and loaded it into a table by previously converting the addresses to numbers as we have seen.
Now we can create views that join the tables together to be able to paint the location of the accesses on the map.
The dashboard
I generate some widgets about access:
The code
As always, here is the code on github
Thank you for your time!