If at first you succeed - “Trie” again!
Many of us with a computer science background may be familiar with the concept of “Trie” data structure. Wikipedia has a decent explanation for them, and where this kind of representation of data might be useful. As part of our ongoing efforts for optimizing page performance – we were challenged to look hard at every call being made on our pages before the page finished rendering and that included calls to our A/B Testing framework as well – which meant one of my teams responsible for this was on the hook for one of those calls – on every page across the entire domain.
Now, usually, in an effort to have the ability to conduct A/B Tests as seamlessly as possible – we were including calls to the A/B Testing framework to determine whether there was something to be done on a particular page, and if you look at most Client-Side A/B testing frameworks, they would require some sort of check like this back to the servers from the web page. And in order to provide as smooth (i.e. Flicker free) visual experience, you want to make that call as high (early) up in the page as possible. This call may be relatively fast, but none-the-less, it does entail a network round-trip, and thus was fair game to be looked at closely.
As it happens, there is an optimization that we can do there. Since we are responsible for generating and deploying the A/B test on any page, we do know which pages have a A/B Test in the first place as a-priori knowledge. Could we leverage this piece of information somehow to pre-empt the call from happening on ALL pages regardless of whether or not it had any A/B test to begin with ?. Turns out we can. If we can provide the list of page URL's currently having a A/B test to the browser session at first touch, a small script running on every page, can fairly quickly determine whether to make the expensive network round-trip just to find if it needed to load any new content associated with a test. And most pages don't have any test on them at any particular point of time. Score one!
At this point you may be wondering what's this got to do with “Trie” data-structures at all. The initial thought was to just expose the list of URL's as is, and the checking script could compare the current page URL against this list to see if it matched with any of these “N” URL's that have an A/B test on them. But even if this is a relatively fast operation when compared to a Network call, those of you with an algorithmic bent of mind would quickly recognize that this is a O(N) operation, granted that in this case the “N” is relatively small. But still, taken across each page for the entire site, this quickly adds up to the cost of rendering the page. Could we improve on this ? Enter the “Trie”.
If instead of the raw list of URL's, we can pre-compute the “Trie” that represents this list, then not only does it produce a much more compact structure, where each URL sub-path that is common across any of these “N” URL's needs to be represented only once! So for example consider two URL's which have an A/B Test running on them (a/b/c/d/e and a/b/c/d/x/y). Here the “Trie” will ensure that the two paths will share a common path up to a/b/c/d and only bifurcate at node “d” with two child nodes for “e” and “x” respectively. The best part is that the testing script now running on every page, only needs to tokenize the current page URL separating each sub-path on “/”, and traverse the “Trie” just once for each sub-path until it either reaches a leaf node that matches with the last last token OR it reaches an intermediate node from which there is no child node matching the next token. This is a O(1) operation. Saving on both Time as well as Space complexity.
There are some minor details such as incorporating “*” in the URL's to account for regular expressions that need to be considered both while building the “Trie” as well as while traversing it when checking any particular page URL against it. However those are easily solved. But hopefully you see why it is important to “Trie” again!
Fortune 500 (Ex-Dell, IBM) & Startup Leader | Distinguished Engineer | Invented 58 US patents | PhD Computer Engineering | Infrastructure Pioneer | Invented Data Security Algorithms | Product Management | CTO
6 年Great post, Shantanu. You have a knack for simplifying complex, algorithmic topics. Kudos!