Distributed Complex Event Processing and data locality
1. Abstract
I would like to share architecture of CEP system taking an advantage of data locality. I am not a CEP expert like Tim Bass or Mark Proctor and I will just scratch the surface of CEP.
2. Event correlation and load
Most of the CEP solution requires one or another form of correlation and load of events and objects from external system into RETE network.
Even if the external store is very performant distributed cache with fast read access, reading and loading context information into RETE network over the LAN add additional latency.
3. Data locality
I decided to test how performant will be solution where the rete network is executed on the same machine where the correlated data is taking an advantage of data locality. I took as a base a Big Table implementation HBase because of the following features:
3.1. Multi version support
HBase support multiple versions out of the box.
3.2. Time to live
HBase automatically delete rows when expiration time is reached
3.3. Co-location of the keys from the same range
A region manages all rows between the region’s start key and end key. I designed the keys of events and objects in a way that all context information was hosted in one region.
3.4. MemStore and BlockCache
MemStore stores data which has not yet been written to HDFS and serve the recent edits. BlockCache store in memory recently read block allowing next request stored in the same block to be served from memory. Because of the nature of the test, correlate the events related with particular short live business process MemStore hits was in 80% of the reads.
3.5. Coprocessors
Coprocessor is code that runs in-process on each region server, giving the possibility to extend the framework with custom code. I used coprocessor as hook for incoming events and RETE network host.
4. Conclusions
Performance was surprisingly good, probably because most of the read request to load context information in RETE hit the MemStore. I can say that for some use cases such architecture could be beneficial