登录查看更多内容

Simplifying pagination in hierarchical structures

Leapfrog Technology, Inc.

Building digital experiences of tomorrow by innovating better, faster.

发布日期: 2024年5月14日

Written by: Irajan Dhakal , Software Engineer

Pagination is crucial when it comes to designing an API as it enhances performance, UX, scalability, and much more. It’s a quite straightforward approach when it comes to independent data sources like lists of existing users, blog posts, or any other use cases.?

On the contrary, when you consider hierarchical data sources like trees or graphs with no limitation of nodes count or child count, performing pagination can be difficult. Unfortunately, such a situation doesn't have a predefined solution.

Faced with a similar issue, in this blog, I will share my experience of dealing with this problem, and how I came up with the solution in my development journey.

Problem statement

Let’s start at the source of the problem. I want to have a list of tasks that can contain unlimited sub-tasks, and subsequently, sub-tasks that contain other sub-tasks inside them. There’s no limitation on depth or node count as presented below.

Storing data

Firstly, let’s look for a way to store the above structural relation in our relational database. There are several ways to store such relations, which might depend on the use case. How frequently will the data be updated? Does the update of the parent affect the child or not? Regrettably, uncovering the best way is a completely different topic that deserves a separate blog of its own.

Considering the task at hand, let’s choose how we can paginate the data. For this, we can store it in any way we want—the only things that matter the most are the nodes ‘root_id’ and ‘immediate_parent_id’. As for the above example, we can maintain a table named ‘Task’ with 3 columns: Name, root_id and immediate_parent_id, as shown below.

Fetching the data

While fetching the data, we need to consider its actual completion. In simple words, if Task 1 is being fetched, the whole subtree of Task 1.1 and Task 1.1.1 should also be in our response. We cannot just return Task 1, Task 1.1 in the first page, and the remaining data Task 1.1.1 and Task 2 in the second page. While fetching the data, we cannot try out the recursive query as we might have fixed pagination size. Moreover, by doing so, an appropriate stopping condition might not be there to stop the recursive query.

For the solution, let’s think out of the box. The problem with pagination is the page size and the complete data. We cannot compromise on the completion of data—but what if we trade off page size with data completion? Of course, we can do this first, given that we have a page size of 50. Now, let’s calculate the child count of each root node with a simple query.

We can gain the data in a basic json format:

Notice that child_count is the number of data that we need to fetch if the root node is fetched. Since we have pre information on the number of data that we need to fetch, we can add up the total child count until we exceed the page size of 50. In the above json response, continue adding child count from the first element and simultaneously compare it with the page size using simple code as shown:

Mirko Peters 3 个月前

What to be Wary of With Iterators and Collections in C#

Nick Cosentino 1 年前

Simplifying API Development with GraphQL

Mantra Labs 4 个月前

After the completion of the above piece of code—provided that the ‘tasks’ variable does have the above json object—the value stored on the variable grouped_roots will simply be [[1,2],[3]] .

Notice that regardless of the child_count being greater than page size (50), we are inserting them in the grouped items in order to achieve completion of data. So, for page 1, we will be returning the whole tree with root_id 1 and 2, whereas for page 2, we will be returning the whole tree with root_id of 3. We can achieve this from the simple ‘where’ clause. The data volume is somehow windowed with a controlling number, i.e. 50, which is our page size.

Applying filter

Applying the filter is now much more simpler than expected. Now, we just need the ‘where’ clause on two different queries, i.e. when we count the children, and when we fetch the data itself.

Since the same ‘where’ clause is used while fetching child count as well as fetching the data itself, both of these will yield the same data.

Creating metadata

It is the responsibility of the backend service to create the metadata, such as total data, page number and page size, to let us know that data is still available while building paginated APIs. These meta data are then used by consumers, whether it be frontend services or any other third party services, that require data to calculate if we have additional data to consume or not.

Basic formula for calculation assumes that the page size will always be the same as provided. Conversely, for the above case, page size is the only controlling variable. There isn’t any guarantee that the API will exactly generate the same number of data for all pages.

Thus, let’s add additional information on metadata, which will indicate if there is more data available or not.

Here, we are totally relying on the value of grouped_roots to paginate our response. We can notice that the last index of the grouped_roots values will be the last response that our API will provide. So, we can calculate the metadata ‘hasMore’ with a simple comparison as:

hasMore = page < len(grouped_roots)

Finally, our meta data for the above response for page 1 and page 2 will look like this:

Conclusion

Paginating the hierarchical structure is not as big a deal as it sounds. The key is to rely on the root_id and how much data it holds. From a bird’s eye view, we can conclude the process in simple algorithm as:

Fetch the child count for each root node with the required filter.
Group the fetch root nodes in a way that the number of children they have is not less than the page size.
Fetch the actual data with the required filter and additional filter of root_id present in a grouped array of root nodes.
Build the tree from the fetched data.
Calculate metadata to include hasMore field.
Send response with list of trees and metadata including hasMore field.

I hope this blog helps you demystify the complexities of pagination in hierarchical structures. If there are any queries, I’d love to answer them in the comment section!

Simplifying pagination in hierarchical structures

Leapfrog Technology, Inc.

Building digital experiences of tomorrow by innovating better, faster.

Problem statement

Storing data

Fetching the data

领英推荐

Applying filter

Creating metadata

Conclusion

Look at the Byte Side

23,116 位关注者

Leapfrog Technology, Inc.的更多文章

社区洞察

其他会员也浏览了

REST, GraphQL, and gRPC: Comparing and Contrasting Modern API Design Patterns

Your Front-end needs a BFF!

An In-depth Look at Apollo Client for Angular Applications

Handling Side Effects in Jetpack Compose: A Deep Dive into LaunchedEffect, SideEffect, and DisposableEffect

Properties of Pub Sub Architecture

Mastering APIs for Data Engineers: REST, GraphQL, and Beyond

Tracking Asynchronous Lambda Function Invocation Lifecycle

GraphQL: Modern API Query Language for Flexible Data Retrieval

Command and Query Responsibility Segregation (CQRS)

How to Use CQRS to Build Fast and Scalable Applications

Problem statement

Storing data

Fetching the data

领英推荐

Applying filter

Creating metadata

Conclusion

Look at the Byte Side

23,116 位关注者

Leapfrog Technology, Inc.的更多文章

How Clean Architecture saved my sanity

Lead as an action, not a position

Jsontology: My tinkering project to simplify JSON

Building for billing: Venturing into the US healthcare system

It's not your job to make your team happy

Leapfrog’s role in shaping Nepal’s future tech leaders

Project Leader vs. Project Manager

The magic sauce for connecting with people

Edge AI: Lights, Camera, OAK-tion

Kill your darlings!

社区洞察

其他会员也浏览了

REST, GraphQL, and gRPC: Comparing and Contrasting Modern API Design Patterns

Your Front-end needs a BFF!

An In-depth Look at Apollo Client for Angular Applications

Handling Side Effects in Jetpack Compose: A Deep Dive into LaunchedEffect, SideEffect, and DisposableEffect

Properties of Pub Sub Architecture

Mastering APIs for Data Engineers: REST, GraphQL, and Beyond

Tracking Asynchronous Lambda Function Invocation Lifecycle

GraphQL: Modern API Query Language for Flexible Data Retrieval

Command and Query Responsibility Segregation (CQRS)

How to Use CQRS to Build Fast and Scalable Applications