Rule Resolution Algorithm - PEGA - What makes it fast?
Bhuvaneshwaran Srinivasan
PEGA LSA (Lead System Architect) | Full Stack Web Developer | Python | Devops Expert | Banking & Telecom Sector Expert | Innovator, Proven Mentor & Team Leader
Brief Introduction:
I basically split this article into two parts - Simplified Theory (For those who basically has little or not much idea about Rule Resolution Algorithm) & what makes it fast. Please feel free to jump into relevant section based on your preference.
Rule Resolution - Theory:
What is Rule Resolution?
Rules In-Scope:
It considers most rules that are instances of classes derived from the abstract?Rule-?base class. Some of them include:
Rules Out-Of-Scope:
Rule resolution does not apply to records that are instances of classes derived from any other abstract base class such as Data-, System-, or Work-. Some of them includes
Input Parameters for the algorithm:
Output:
How does it work?
How rule cache is populated?
Theory - Conclusion:
To put it in simple words, it is basically a search, ranking(sorting) and filtering algorithm that factors certain input parameters into consideration and applies an inclusion and exclusion logic before deriving and identifying a single successful rule match.
Rule Resolution - What makes it so fast in picking the right rule?:
The key part to this is the first block in the Rule Resolution logic. i.e., Rule Cache. Now, let's take a detailed look at it. As PEGA states
领英推荐
An in-memory rule cache helps the rule resolution process operate faster. If the system finds an instance (or instances) of the rule in question in the cache, it accepts what is in the cache as the candidate rules and skips many steps in the resolution process.
Why do we need a rule cache and how does it bring efficiency?:
Imagine a scenario, where you have a "master" table that holds information about people who hold either full licenses or automatic licenses and whether or not they are eligible to apply. You are being tasked with identifying a winner based on the following criteria and come up with an efficient algorithm technique.
Now, if you apply the logic one by one based on the above algorithm, the resulting steps would be
For a few number of rows, it was fairly straightforward, but what if there are millions of rows? It is going to be resource intensive and not effective if it is being constantly requested for few thousand times by hundreds of users at the same time. It would result in way too much of database traffic.
But what if there is a "cache" table that is holds "delta" records for more frequently accessed records unless a record has been added / deleted / modified? It will instantly eliminate the necessity to perform a lot of resource intensive operations because it already found the 'right' match that we were looking for and therefore it need not query the "master" table that holds millions of records and instead pick the record from this cache table easily.
Likewise, it is PEGA's idea behind holding rules that already underwent the process of extensive rule resolution algorithm's search and therefore it need not re-run it again and again on the master table. It is available per node and the definition is given below
Rule cache — Per node. Reduces PegaRULES database traffic, contains copies of rules recently accessed or recently updated by developers. Occupies virtual memory. The PegaRULES agent, during the periodic system pulse, invalidates rules in the rule cache that were recently updated on another node.
There was still a problem with rule cache - System Pulse:
As rules were committed to the database, the System Pulse table (pr_sys_updatescache) was also updated to reflect which cached rules should be invalidated.?In earlier version of PEGA up until 7.1, a System Pulse Agent runs every 2 minutes / 60 seconds to read from the pulse table in batches and updated the local caches to ensure records are up to date periodically.?With this mechanism, rule changes could take anywhere between 1 to 2 minutes to be reflected on all nodes.
Now, to explain the seriousness of this issue in production, imagine a rule is delegated to a user who is responsible to revert back an offer for black friday deal at midnight on Saturday. If it takes up to 2 minutes to "synchronize" across all nodes, and imagine 100,000 customers placed an order between 12:00 am and 12:02 am, it would incur a huge loss to the customer.
What appeared to be trivial certainly looks like complicated and problematic!
PEGA's optimisation of this issue to updating the rule cache across nodes almost near real time:
When clustering (Hazelcast and Ignite) was introduced in PEGA 7.1.7 onwards, the updates began to occur in near real-time.?Instead of saving to the database during commits, update messages are gathered and broadcast via a cluster-wide Topic post-commit.?Listeners on each node receive the messages (in order) and process them. A diagrammatic representation of how the cache is loaded at each node is given below for an embedded hazelcast.
If you would like to know more about it, here is the link below
Now, with this in memory cache availability that is near realtime, Rule resolution algorithm operates even faster and quicker compared to the previous releases of PEGA.
Conclusion:
When Pega Platform rule cache was optimized to a use cluster-based pulse, the system started sending pulse messages directly to the other nodes. SystemPulse agent was bypassed during this process. As a result, the pr_sys_updates cache table remains empty at all times. As part of this mechanism, changes to the contents of the rule cache are synchronized during the pulse operation and any changes to a rule saved on one node are immediately reflected in processing that occurs on any other nodes thereby bringing efficiency to the whole rule resolution algorithm and eventually made it much more faster, effective and more reliable.