Alvin + Mode: Data Lineage and Usage for your Reports & Charts
This is an article by our developer Mariana Medeiros ??
As soon as I joined Alvin as a Data Engineer I was given the task of building an integration with Mode, a collaborative data platform that we’re seeing get some rapid adoption. It was the first time I worked with core software engineering and it has been a really nice journey so far with lots of learning, accountability, and building stuff.
As a former civil engineer, building things is something I have always enjoyed. Now instead of construction, I am helping build a really amazing data product with an awesome and talented team at Alvin ??
It is not news that the data space has been getting a lot of attention (and tools!) in recent years. At Alvin, we are building a product that helps improve the quality and usability of data, offering features such as impact analysis (what happens if I delete this column?), data lineage, problem tracing and usage analytics, all within a lightweight data catalog.
To achieve this, we are integrating with the most prominent data tools. We currently have integrations with data stores such as RedShift, BigQuery and Snowflake, BI tools such as Looker and Tableau, data transformation frameworks such as DBT, and data orchestration platforms such as Airflow.
To prioritize our integrations, we always listen to the data community, and this time they shouted: Mode!
How did we integrated with Mode
As described on the Mode?website ?:?”Mode is a modern analytics and BI solution that combines SQL, Python, R and visual analysis to answer questions faster”
Mode has?3 metadata sources ?which are:
To make sure the integration works, regardless of Mode plan (at the time of writing at least!), I decided to use the Core API.
We used?Amundsen databuilder ?extractors as starting point to create our own metadata extractors, and we took inspiration from their ETL framework to load that data into our backend.
Although Amundsen has some ready to use Mode extractors they do not fit our use case, so I created extractors based on their?RestApiExtractor ?that utilises?RestApiQuery ?to extract data.
The?RestApiQuery ?basically works by chaining a sequence of API calls (chaining RestApiQuery objects) as we hardly get any of the data we want by making the request to a single endpoint.
To tackle this,?RestApiQuery?response payload is parsed using JSON expressions. This seems a bit like magic but it basically just means that we can extract values from certain paths in the response object easily. A simple example is if you have an object like?{"name":"alvin"}?you ask for “name” and you get “alvin” back. The extracted fields can then be used to construct the next API call/endpoint for the next?RestApiQuery?object — for instance supplying parameters for pagination.
For instance, let’s say we want to extract reports metadata that exists inside spaces. This would require us to retrieve data from?https://app.mode.com/api/spaces/<space_token>/reports/. So, prior to getting the reports, we would have to first get the list of the available spaces in the workspace/organisation to plug the?space_token?in this URL.
That’s what the?ModeDashboardExtractor ?does, which is an extractor from Amundsen. So, first it gets the?spaces query ?building a?ModePaginatedRestApiQuery?object which is just the?RestApiQuery?with some pagination logic added to it.
From the?API response payload ?it parses the JSON to get the?token?(which is like the space id),?name?and?description?and builds a python dictionary object for each record in?spaces.
These records are then passed to the parent?RestApiQuery?which is the reports query.
And then we have the the?space_token?(which they call?dashboard_group_id) available to call the reports endpoint. For now, we don’t need any more API calls to get the data we want so with this final?RestApiQuery?object we build the?RestAPIExtractor?to then start extracting the records from Mode.
Besides building the extractors, I also built our own transformers and mappers to create our data entities model objects to load to our database (we have our customised Loader class for that!).
One thing to note here is that Mode does not provide granular usage data (i.e. individual report views) through their API, but just the accumulated view count. This isn’t much good to us as we need to know when the views took place. For example, a report could have 500 views, but if they all took place more than12 months ago, then it doesn’t have much business value anymore. I solved this by calculating the daily view count in our backend.
The main objective of this integration was to get data from Mode reports and charts, including the queries from reports so we can generate the lineage, which in turn powers our impact analysis feature.
领英推荐
? Milestone achieved: our plug and play Mode integration is up and running!
How does it look in Alvin?
Connecting Mode in Alvin is easy peasy lemon squeezy (just like all of our integrations!). Here are the quick steps to follow:
1. Get Mode API token
Get your Mode API token:
It’s important to note that the only data we access is read only metadata from Mode resources.
2. Add Mode as a platform
After that, go to your Admin panel in Alvin, click on?ADD PLATFORM and choose Mode. It’s important to note that the only data we access is read only metadata from Mode resources.
Then fill out the following fields:
-?Name: unique name for your connection
-?Display Name: name to display on the Alvin UI
-?Access Token?and?Password: Retrieved from API Tokens in your Mode Account
-?Workspace Username: username of the workspace (previously referred to as Organization). You can retrieve this information directly from your url?https://app.mode.com/<workspace_username>/spaces
After filling the credentials, hit the sync button. In a few minutes you’ll be able to navigate through your Mode reports and charts ??
And with that we are done! Now let’s check out a couple of examples of what all that metadata powers in the Alvin UI:
Upstream lineage:?Find out which tables and columns are being used to in your reports.
Downstream lineage:?View all reports that use a certain table or column in your data warehouse.
Impact analysis (I love this one!):?Find out which reports/charts will be affected by your changes (e.g. dropping a table or column).
Time series usage stats:?See views for your reports over time. See what is popular and remove those that have gone stale.
Wrapping Up
So that wraps up how I got on integrating Mode with Alvin. If you fancy trying it out for yourself, you can sign up to the beta?here ?- I’d love your feedback!
So that’s one down, many, many more to go ?? Feel free to let me know what you think our next integration should be in the comments or via?Linkedin . If you want to know more about Alvin and our mission, check out our other posts?here . Otherwise, stay tuned for my next one!
Want to try Alvin?
Alvin is revolutionising how scaling teams govern their data through automation. We’re committed to building Alvin with the input of the data engineering community, and we decide together what to build next.
If you’re interested in a demo, you can?request it here