"MicroPDT?" - Innovative solution to Optimizing Performance in Looker (11 min read)
Mehul Rajen Shah
Delivery Lead | GenAI, Data & Analytics | 2x C-Phi-O Leaderboard & TCS Young Achiever Award
In the dynamic landscape of data analytics, optimizing performance is a constant pursuit. Looker, a powerful Data Modelling and exploration platform along with Dashboarding Capabilities, offers a solution to enhance query speed and efficiency through Persistent Derived Tables (PDTs).
In this article, we'll explore an innovative solution "MicroPDT" in Looker and how they can be leveraged for performance optimization.
Article Agenda:
Persistent Derived Tables (PDTs) and its challenges:
Looker strongly recommends to use PDT's to optimize query performance
Major challenge are:
Aggregate Awareness
Looker also recommends using aggregate awareness (AA).
Some of its major challenges
MicroPDT
Micro PDT is a Hybrid solution leveraging the best of both the worlds - PDT and AA and at the same time avoiding the limitations they have.
In Layman terms, Understand this as a smallest dataset with required columns and rows which is required for Dashboard KPI or visualization. Instead of Dashboard going directly to Parent PDT's it will retrieve data from MicroPDT's which is quite smaller in size.
Lets understand how it works:
Thinking why MicroPDT - "I couldn't thing of a better name but "MicroPDT?" since it really works at micro level and giving performance at larger level much better than PDT's" - Mehul Rajen Shah
Steps to implement MicroPDT
Step 1> Explore the long running KPI from the Dashboard
Step 2> Open the SQL tab (under Data ribbon) and Click on Open in SQL Runner
领英推荐
Step 3> Run the query to see the results
Step 4> Click Settings and Get Derived Table LookML
Step 5> This should give you something like below, click add it to your project.
Step 6> Just select the right Project and give a standard view name. Click Add.
Step 7> Now the SQL Derived table view file (i.e. MicroPDT) has been added to your Project. Please note: this is still a Derived table and to make it a PDT you need to add persistence strategy.
To know more adding persistence please read this article
Step 8> In the code please remove any aggregations and replace that with column. We do not want any pre aggregations happening since this will limit us to slice and dice the data later.
Step 9> Add required dimensions and measures in the view file required for the Dashboard filters to work. Save.
Step 10> In the model file using this view file create an explore e.g. explore_sql_runner_query. Save & Validate.
Step 10> Now using this newly created explore build the original KPI in the explore
Step 11> Save as to a New or existing dashboard.
Now if you notice the original query scanned size and time would be much higher compared to the newly created one. MicroPDT has much lower dataset and hence the time to run the query against it is comparatively low.
Do we have any Downside, Yes:
Our Story
I and my team at Quantiphi have developed and worked on this innovative solution which really helped to bring down the run times to massive 60+%. This has not only helped reduce the dashboard run times but also save a bit of cost of running heavy queries on large datasets.
About Me
Mehul Rajen Shah - have a more than a decade experience in Data Analytics & GenAI driving excellence in Delivery & Automation, Optimization Initiatives. I love to connect with people and discover new ideas.
Please share your thoughts if this helps and your feedback.
Framework Engineer at Quantiphi || Looker || GCP || Terraform
1 年This really helped alot. Thanks Mehul Rajen Shah for exploring this innovative Micro PDT solution.
Delivery Lead | GenAI, Data & Analytics | 2x C-Phi-O Leaderboard & TCS Young Achiever Award
1 年Note - this might look lengthy, but trust me the ROI of efforts is huge. Please comment if you have tried/tested and/or facing issues.