登录查看更多内容

Beyond Caching

John DesJardins

Field CTO Akka | Passionate about global scale distributed applications | Lifelong Learner

发布日期: 2023年3月20日

Developers and Architects: You may not know it, but you've moved way beyond caching, and that is a good thing!

There is widespread and growing mis-use of the term “caching” in the software industry, particularly among developers and architects. The technology community is doing itself a disservice, devaluing mission critical use-cases and their role in the IT landscape.?

Using the wrong term, in this case “caching,” has implications for getting budgets, attracting top talent to work on your projects, and may impact adoption after your application is live.

In this blog, we explore what caching is, where it adds value, and then dispel myths by drilling into areas that are often associated with caching, but are more advanced patterns. We will conclude with the benefits of these more advanced patterns.

tl;dr

Do yourself a favor, embrace the value of what you’re doing with real-time data, using terms like Real-time Operational Data Store or Real-time Data Mesh or Data Fabric, or Digital Integration Hub.

Don’t undersell and devalue it by incorrectly calling it “caching.”?

Let’s go into this topic and start to sort out this terminology mess.

Caching Is:

Key/Value operations focused on enabling sub-millisecond reads of application data. Caching can follow various patterns such as write-through/read-through, write-behind, cache-aside, etc. But, these still follow the basic NoSQL pattern, and the primary goal is faster read operations on frequently used data, or else providing a fast shared memory store for use-cases such as web session replication and clustering. It is OK for the edge use-cases where it is needed, and isn’t going away. However, let’s stop using too broadly and incorrectly!

From Wikipedia: In computing, a cache {...} is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.[2]

Caching Is Not:

Storing All Values Needed for Operational Processing - when you load a unified view or complete data set to be used for real-time low latency processing, you’re going into the realm of the Operational Data Store.
Cross-sectional data access - across objects in a NoSQL data store or in-memory data grid, or database. Even if you also have the data in another database, if your application is primarily using Hazelcast or similar technologies as your primary data store for operational purposes, that is not a cache. If you are using SQL to query this data, this is also beyond a cache.
Building indexes to optimize query performance - if you’re just following a key/value pattern and doing fast reads, your client probably already uses a consistent hashing algorithm and knows where to get the data without indexes.?
On-Cluster Computations such as, in Hazelcast, the EntryProcessor and Executor Service APIs. This is not caching, even if it uses cached data for the analytics. Optimizations can me made to ensure data locality, efficient serialization can be tuned for the use-case, etc.?
Doing aggregations, particularly on changing data - definitely not “caching.”
Using CDC and stream processing such as to to continuously synch data into a fast operational data store.
Using queues and topics - this would be messaging, or “eventing” and not caching.
Using our CP subsystem - that was introduced to enable workloads that would normally be done on a database. Caches are generally AP systems.

领英推荐

Shared Database Pattern in Microservices

Arpit Bhayani 2 年前

Caching Strategies in Distributed Systems

David Shergilashvili 2 个月前

Essential Guidelines for Effective System Design

Momen Negm 5 个月前

These patterns are common to Real-time Operational Data Stores, or similar patterns such as Digital Integration Hub or an emerging pattern, the Real-time Data Mesh/Fabric.?

Real-time/Fast Operational Data Store:

effectively a “real-time, continuously updating, materialized view” that often spans multiple data sources and often includes reverse ETL to load/join data from data lakes and warehouses with real-time data. In many cases, this layer is architected to deliver higher uptimes than the underlying databases, which reduces the cost of those while delivering an Always-On experience for your customers.

Common Business Problems Solved:

Unified Real-time View of Customer, or Operations, or Transactions, etc.
Digital Modernization through a digital data layer in front of older systems of record and legacy systems, enabling Digital innovation and agility, as well as offering an always-on 5 Nines architecture suitable for real-time applications.
Real-time P2P or Card Payments Processing - Authorization, Orchestration and Fast Data Calculations.
Real-time P2P or Card Payments Fraud - Enabling instant execution of fraud Machine Learning rules as well doing real-time feature engineering and serving as a real-time feature store.
Digital Banking - Always-On Fast Unified Data Layer working across digital channels and decoupling them from Legacy systems and older databases.
Smart Retail - continuous inventory management
Real-time Personalized Offers for retail, media, travel, banking, insurance or other consumer facing applications.
Fast Edge Data Processing such as in Smart Warehouses and Smart Manufacturing - for example, enabling many companies who can now offer you same day delivery of products.
Faster, smarter trade processing and monitoring as well as risk calculations, continuously updated position tracking
Market data aggregation for faster data access in a simplified architecture.

Using the wrong term undervalues your efforts, and that sends the wrong message to both your IT and Business leadership. “Caching” is perceived as low-value and solving some technical need but generally not viewed as solving a mission critical business problem.

Using the proper high-value term will increase your ability to communicate with business stakeholders and IT leadership. This helps you convince them to fund further innovation around real-time and can help you fight for budgets in a constrained environment. It also helps when requesting the right level of mission critical infrastructure from IT and educating the Operations teams on how mission critical this part of the architecture is.

By the way, this evolution from caching to fast operational data stores, is actually the beginnings of your journey towards real-time data processing. That journey is leading towards a real-time version of the data mesh pattern.

Real-time Data Mesh - What's Next

A Real-time Data Mesh can take the architecture to the next level.?

Data Products - reusable data assets, now continuously updated in real-time.
Decentralized ownership, Centralized reuse - creating a real-time unified view of your business.
Real-time Ingest for ETL and Reverse ETL that happens continuously, ensuring the right data is available to the right applications at the instant needed.
Real-time Stream Processing for fast analytics connected to action.
Real-time Machine Learning for intelligent, automated action to create autonomous applications.

I'd like to thank Fawaz Ghali for his review of this blog and particularly for his contribution of the graphic included.

Don Campbell

Experienced technology Executive focused on emerging and disruptive technologies. In memory data grids, Real Time streaming, and Machine Learning. Experience building Commercial Enterprise and Federal sales teams

1 年

John, great article. It's all about speaking the same language with agreed definitions.

1 次回应

Caspar Thomas

Senior Director - Global Integration Architecture Lead

2 年

Good article John - interesting read - it's more than just a question of using the right terms, it's understanding the architectural implications of the concepts.

1 次回应

查看更多评论

要查看或添加评论，请登录

John DesJardins的更多文章

Top Five AI Predictions for 2024

2024年1月16日

Top Five AI Predictions for 2024

It’s predictions time of year again. 2023 was unique in so many ways.

1 条评论
Big Data Critical to Progressing from Connected to Smart to Autonomous Things

2016年10月26日

Big Data Critical to Progressing from Connected to Smart to Autonomous Things

There is a progression in the maturity of solutions around the Internet of Things, from Connected, to Smart, and…

3 条评论
Human-First: Connected-World/Smart-Devices Demands Human-Centered Design

2015年8月6日

Human-First: Connected-World/Smart-Devices Demands Human-Centered Design

We seem to have lost our humans! CONNECTED - Cars, Cities, Businesses, Airplanes, etc. SMART - Phones, Wearables, TVs…

7 条评论

Beyond Caching

John DesJardins

Field CTO Akka | Passionate about global scale distributed applications | Lifelong Learner

Developers and Architects: You may not know it, but you've moved way beyond caching, and that is a good thing!

tl;dr

Do yourself a favor, embrace the value of what you’re doing with real-time data, using terms like Real-time Operational Data Store or Real-time Data Mesh or Data Fabric, or Digital Integration Hub.

Don’t undersell and devalue it by incorrectly calling it “caching.”?

Caching Is:

Caching Is Not:

领英推荐

Real-time/Fast Operational Data Store:

Common Business Problems Solved:

Real-time Data Mesh - What's Next

John DesJardins的更多文章

社区洞察

其他会员也浏览了

Data per Service Pattern in Microservices

Caching is not always in-memory

Caching - Evolving your Architecture

Designing a Fault Tolerant Database for Scalable Distributed Systems

May 2023: Metamorphic testing, Oracle migrations, and online schema changes

Caching strategy design system part 15

9 Backend Questions Every Big Tech Companies Asks

CAP Theorem: Understanding Trade-Offs in Distributed Systems

Ensuring Data Reliability in Apache Kafka

Scale-up vs Scale-out

Developers and Architects: You may not know it, but you've moved way beyond caching, and that is a good thing!

tl;dr

Do yourself a favor, embrace the value of what you’re doing with real-time data, using terms like Real-time Operational Data Store or Real-time Data Mesh or Data Fabric, or Digital Integration Hub.

Don’t undersell and devalue it by incorrectly calling it “caching.”?

Caching Is:

Caching Is Not:

领英推荐

Real-time/Fast Operational Data Store:

Common Business Problems Solved:

Real-time Data Mesh - What's Next

John DesJardins的更多文章

Top Five AI Predictions for 2024

Big Data Critical to Progressing from Connected to Smart to Autonomous Things

Human-First: Connected-World/Smart-Devices Demands Human-Centered Design

社区洞察

其他会员也浏览了

Data per Service Pattern in Microservices

Caching is not always in-memory

Caching - Evolving your Architecture

Designing a Fault Tolerant Database for Scalable Distributed Systems

May 2023: Metamorphic testing, Oracle migrations, and online schema changes

Caching strategy design system part 15

9 Backend Questions Every Big Tech Companies Asks

CAP Theorem: Understanding Trade-Offs in Distributed Systems

Ensuring Data Reliability in Apache Kafka

Scale-up vs Scale-out