How can you process millions of transactions per second and actually make sense of them in real time?
Deepesh Jain
Founder & CEO, Durapid Technologies | Enterprise Architect | Assisting Enterprises With Seamless Digital Transformation
Here is an interesting challenge we have encountered recently and resolved with our expertise!
We had this e-commerce client who was growing rapidly in their field, their business was growing fast, while their data processing?
Not so much.
In other words, the transactional data on the various platforms continued to rack up by the second, while the pace became more than their systems could bear. This led to great delays in taking advantage of opportunities with respect to not only inventory but also personalized marketing.
During our strategy session, we came up with an outlined two-tier approach.
Building something that could handle both the volume and speed they needed.
We chose Kafka and Spark for key technologies thereafter. Kafka is ideal in handling really large streams of data in a fault-tolerant manner, while Spark helps process it in near real time.
Indeed, our solution was aimed at Kafka's streaming capability; we made sure to prepare dedicated topics for transactions, user behaviors, and inventories in a way that no data stream would reduce the system to a bottleneck.
The processing would be done through Spark Streaming. It was implemented on the Kafka topics by the team, pulling from these topics and thus performing real-time analysis without hampering performance.
The results were good.
Inventory updates went from hours behind to near real-time. They could see trends as they happened, and personalized offers got out the door while customers could still browse.
In retrospect, the results vindicated our architectural decisions. Once you get the fundamentals right, everything downstream tends to fall into place rather organically.
What's your experience concerning very high-volume data processing?
Leave your side of the story in the comments below!