"MicroPDT?" - Innovative solution to Optimizing Performance in Looker (11 min read)

"MicroPDT?" - Innovative solution to Optimizing Performance in Looker (11 min read)

In the dynamic landscape of data analytics, optimizing performance is a constant pursuit. Looker, a powerful Data Modelling and exploration platform along with Dashboarding Capabilities, offers a solution to enhance query speed and efficiency through Persistent Derived Tables (PDTs).

In this article, we'll explore an innovative solution "MicroPDT" in Looker and how they can be leveraged for performance optimization.


Article Agenda:

  • Persistent Derived Tables (PDT) and its challenges - 1 min
  • Aggregate awareness (AA) and its challenges - 1 min
  • MicroPDT? solution and its advantages over (PDT) and (AA) - 1 min
  • Steps to implement MicroPDT - 8 mins


Persistent Derived Tables (PDTs) and its challenges:

Looker strongly recommends to use PDT's to optimize query performance

Learn more about PDT

Major challenge are:

  • It is a huge dataset with almost all columns and rows
  • Since the dataset is huge, the query has to traverse a lot of columns/rows to find results.


Aggregate Awareness

Looker also recommends using aggregate awareness (AA).

Learn more about Aggregate Awareness

Some of its major challenges

  1. Static combination of filters which needs to be defined?during build. the filter change then AA does not work
  2. Limitations with custom dimensions and measures
  3. It is a Native Derived table hence it introduces warnings while using user attributes and hence you cannot implement Row Level Security


MicroPDT

Micro PDT is a Hybrid solution leveraging the best of both the worlds - PDT and AA and at the same time avoiding the limitations they have.

In Layman terms, Understand this as a smallest dataset with required columns and rows which is required for Dashboard KPI or visualization. Instead of Dashboard going directly to Parent PDT's it will retrieve data from MicroPDT's which is quite smaller in size.

MicroPDT's are like child PDT's which sits in between the Parent PDT and the actual Dashboard KPI.


Lets understand how it works:

  1. Its is a view Based solution and a SQL Derived table it works well with user attributes and hence can apply Row Level security.
  2. We do not apply the filters unless required. The final filtration happens directly in the dashboard. This really avoids limitations which AA has.
  3. We can leverage the generated SQL and custom build it for pre-storing the results
  4. Data scanned is low since it only has to invoke the MicroPDT’s which is very small in size (MB's or KB's)
  5. Cost of the query is less since it scans very less data. Hence saves some ??

Thinking why MicroPDT - "I couldn't thing of a better name but "MicroPDT?" since it really works at micro level and giving performance at larger level much better than PDT's" - Mehul Rajen Shah


Steps to implement MicroPDT

Step 1> Explore the long running KPI from the Dashboard

Explore from here to open in a new tab


Step 2> Open the SQL tab (under Data ribbon) and Click on Open in SQL Runner


Step 3> Run the query to see the results

Step 4> Click Settings and Get Derived Table LookML

Step 5> This should give you something like below, click add it to your project.

Step 6> Just select the right Project and give a standard view name. Click Add.

Step 7> Now the SQL Derived table view file (i.e. MicroPDT) has been added to your Project. Please note: this is still a Derived table and to make it a PDT you need to add persistence strategy.

To know more adding persistence please read this article

Step 8> In the code please remove any aggregations and replace that with column. We do not want any pre aggregations happening since this will limit us to slice and dice the data later.

Step 9> Add required dimensions and measures in the view file required for the Dashboard filters to work. Save.

Step 10> In the model file using this view file create an explore e.g. explore_sql_runner_query. Save & Validate.


Step 10> Now using this newly created explore build the original KPI in the explore

Step 11> Save as to a New or existing dashboard.



Now if you notice the original query scanned size and time would be much higher compared to the newly created one. MicroPDT has much lower dataset and hence the time to run the query against it is comparatively low.

Do we have any Downside, Yes:

  • If you have a table grid in the dashboard and if you turn that into a MicroPDT you may not see much of a size difference
  • Since sits on top of Parent PDT it uses extra space from your storage
  • Additional columns many not be by-default present. Hence if you need new dimensions/measures you may need to add it to the query first.


Our Story

I and my team at Quantiphi have developed and worked on this innovative solution which really helped to bring down the run times to massive 60+%. This has not only helped reduce the dashboard run times but also save a bit of cost of running heavy queries on large datasets.


About Me

Mehul Rajen Shah - have a more than a decade experience in Data Analytics & GenAI driving excellence in Delivery & Automation, Optimization Initiatives. I love to connect with people and discover new ideas.


Please share your thoughts if this helps and your feedback.

Rutuja Gugale

Framework Engineer at Quantiphi || Looker || GCP || Terraform

1 年

This really helped alot. Thanks Mehul Rajen Shah for exploring this innovative Micro PDT solution.

Mehul Rajen Shah

Delivery Lead | GenAI, Data & Analytics | 2x C-Phi-O Leaderboard & TCS Young Achiever Award

1 年

Note - this might look lengthy, but trust me the ROI of efforts is huge. Please comment if you have tried/tested and/or facing issues.

回复

要查看或添加评论,请登录

Mehul Rajen Shah的更多文章

社区洞察

其他会员也浏览了