Navigating large and nested data structure effectively
Problem Statement:
In today’s world, the data size is growing exponentially and demand for performant web application goes hand in hand with data size. We often face challenges that include a highly nested data structure and it needs to be processed in today’s digital era and it has to happen quickly. A similar problem was faced by us where we had a nested data structure sizing more than 6Mb which is backbone of the application and also most used functionality of it but technology used in the application i.e. AngularJS was not able to handle such a huge document and make the application user friendly.
About application:
This application provides an interactive and customizable version for disclosure requirements in assurance domain hat provides an interactive and customized checklist of disclosure requirements that is used to enhance the efficiency of financial reporting. Each checklist incorporates applicable provisions of accounting and reporting pronouncements promulgated by different regulatory bodies, and provides the user with links to underlying guidance which is available in Inform.
Application contains a checklist edit screen having multiple nested data structure with leaf node being ~6-7 level nested inside one another and frontend technology used was Angular JS 1.6 is created using AngularJS(1.6).
Expand/Search/Filter functionalities are an integral part in Checklist filling process and it was taking ~140s to finish the process and creating an enormous ~34Mb HTML to be rendered. This caused a huge load on JavaScript and html rendering engine of different browsers and resulted in Wait/Kill pop-up coming several times.
The scope of these changes are:
- Make significant improvement to the performance of Checklist rendering.
- Remodelling the Checklist to enhance user experience regarding the scroll and search.
Significant performance issues with the Disclosure Checklist have been identified for the following cases:
? When user does expand all of the checklist.
? When user filters the checklist on some flags.
? When user tries to fill the checklist.
? When searching for different parts of the Checklist.
Some JavaScript profiling information before the redesign:
1.Expand All:
2. Filling Checklist
3. Filter:
Solution:
Performance Improvement for Checklist was achieved by following below steps:
1. React and Redux
2. Conversion of nested tree data to flat list
3. Infinite scroll
4. MS Word style search and navigation
Step 1: React and Redux
First step was to use React with AngularJS as the UI layer for the Checklist edit screen. AngularJS performance degrades when the data size increases. Angular watchers size increases when number of data set are increased. (watchers are the way angular works with 2-way databinding). Angular checks all the watchers for any data change that may have happened from user action on the html or some server pushed event causing the controller variable to change and this process is continuous irrespective of the data actually being changed. Angular’s prescribed watchers count is ~2000 and with the checklist sizes ~6Mb and all data sets being used for some UI checks, the watchers count increased to a whopping 1,00,000 and more depending on the users selection for checklist creation.
React takes upper hand with respect to rendering UI components than AngularJS. ReactJS has jsx i.e. html like javascript, which is being converted to VanillaJS for DOM creation e.g.
At runtime, this javascript code along with the jsx is read and compiled to make virtual DOM. Virtual DOM is an in-memory DOM created by React when any state change occurs i.e. an event takes place by the user inputs. React then uses it’s diffing algorithm to understand what actually changed and how it can be rendered by manipulating the DOM by the minimal changes possible. A re-render can only be triggered through a user action and it will not affect all the DOM nodes. So in case of high amount of data sets that needs to be shown or used to evaluate the login behind showing some node, React provides us with better efficiency because of the ability to use browser events directly instead of having watchers and do a continuous dirty checking on all the DOM nodes.
React works on the basis of one-way data flow and the state/data management can be achieved with Flux or Redux. Redux was used for the data management. Below is the diagram depicting the data flow:
To give some perspective, let’s take the classic model-view-controller (MVC) pattern, since most developers are familiar with it. In MVC architecture, there is a clear separation between data (model), presentation (view) and logic (controller). There is one issue with this, especially in large-scale applications: The flow of data is bidirectional. This means that one change (a user input or API response) can affect the state of an application in many places in the code — for example, two-way data binding. Redux works on one-way data binding as depicted in the diagram above. Components can subscribe to events triggered by user action to sync with the current state. That can be hard to maintain and debug. In one-way databinding, there is always one source of truth, the store, with no confusion about how to sync the current state with actions and other parts of the application, hence the application becomes more predictable. Redux stores are immutable i.e. no direct mutations can be done. In order to do mutation, one need to make a replica of the data set and then do the mutation. This helps in using the Pure Components of React.
Pure Components means components which only gets rendered on the state change if the props or the state for the particular component is changed and not get affected from any change in data not responsible for it’s rendering. As shown in the diagram above, if branch component needs to be updated by user action to have some different child component then only the parent component and itself needs to be updated and rest of the components do not get affected by the change, thereby reducing the DOM manipulation. The concept of immutable data stores in Redux allows us to check simple referential equality instead of doing a recursive check for all data changes, thus increasing the speed of evaluation of the same task. Redux stores are very helpful when it comes to state management specially in a framework which implement one way data binding. A few of the advantages include:
- Predictable state updates make it easier to understand how the data flow works in the application
- The use of "pure" reducer functions makes logic easier to test, and enables useful features like "time-travel debugging".
- Centralizing the state makes it easier to implement things like logging changes to the data, or persisting data between page refreshes.
- Redux stores are immutable store which has its own plethora of advantages.
We replaced the rendering logic of the Checklist with an angular directive consisting of div element. ReactDOM used this div element to render the tree inside. Redux store was initialised from this directive using the data fetched during the page load. React calls for the action ‘FETCH’ to get its first set of data and render the higher level component which in turn calls the presentational components.
2. Conversion of nested tree data to flat list
The next issue on our plate was filtering the data based on the given inputs. Searching data inside a nested data structure with more than 100K rows of JSON elements is complex and time consuming because of the recursive nature of the traversal for searching any data. If a data mutation is required arising from the user action, we need to recursively find the given node and mutate the data. We converted the nested tree data to flat list using DFS (Depth first traversal) and using an attribute ‘sequence’ to determine how nested a given data node is in the checklist. This attribute was used in the checklist to give a left padding to the row, giving the user a feel of tree data. Below screenshot is an image for a simple nested data structure.
In the screenshot given above, row 1 is of sequence 0, row 2 of sequence 1 (left padding-15px), row 3 of sequence 2(left padding-30px) and so on…
So if we need to mutate data e.g. for row 2, we just need to find the id of the row 2 in the list instead of using a recursive strategy to find the node. This can further be improved by having a map and getting the index from the map and updating the data corresponding to the index.
Implementing the flat list instead of a tree data set brought the timing for any action happening after expand because of efficient searching and tree data structure was intact from a user’s perspective.
3. Infinite scroll
Below is the graph depicting application performance of the application for expand all operation before and after putting React/Redux and flat list implementation.
Using react and redux in harmony with AngularJS solved a part of the problem which was the first load of whole tree after expand all operation and brought the rendering time to ~50s but the html size was still too big to be handled by the browsers. The rendering engine uses GPU for rendering the HTML and calculating the layout of the different parts of the application. The HTML size was decreased due to react cutting the extra html elements which is not used by html rendering engine e.g. attributes starting with ng-*. This humongous HTML was a very big problem for Enterprise application. A user cannot see all the rows at a given time and there were ~4000 rows per checklist, so we decided to have infinite scroll in the page and rendering 100 rows at any given point of time.
If a user scrolls through the checklist and he reaches nearly the bottom, 20 rows from the top are removed and 20 rows are added at the bottom, thereby keeping the row count fixed at 100. The same process goes if a user toggles top level element to show every node which is directly underneath it and if this count goes beyond 100, only 100 are kept and rest is removed from the viewport. This helped in limiting the DOM size and increased the rendering time significantly. User experience increased because they don’t have to collapse the node every time to get good speed without any visible lag.
4. MS Word style search and navigation
This was a usability enhancement to enhance the user experience and increasing the speed to navigate to any row of the Checklist. We took the idea from MS Word for search and navigation of any document. Every node of the checklist which has a sequence of 0 or 1 was shown on the navigation pane situated on the left of the checklist. Navigation pane consisted of a search box which can be used by a user to find any row consisting of the given user input and they can click the search results to get to the row in the checklist. This helped the users to quickly navigate through checklist and return to some part of the checklist later in time if they require to do so.
Profiling Statistics after React:
1. Expand all
2. Filling Checklist
3. Filter
Comparative analysis
Conclusion
AngularJS is good for creating applications which have smaller data set and is very fast to develop and when the data set increases, one may have to deter from the path of Angular framework and use React along with a good data design to enhance the user experience in any web based application and on a bigger scale a technology should not put a constraint on what an application can be. When it comes to current world web applications, user experience is the foremost thing and when it comes to UI, it never goes without saying “measure, measure, measure…”.