登录查看更多内容

The Document Versioning Pattern in Azure Cosmos DB

Michele Arpaia

CEO evoila Italy

发布日期: 2020年8月24日

In high regulated industries, such as Finance, Healthcare, Insurance, etc., tracking histories of some portion of the data is paramount. This may be due to auditing, reporting, or simply for comparison and statistical analysis.

For instance, one of my current customer, has the need to keep track of "amendments" for premiums and claims, for each specific client. This is very typical.

One of the key features of Azure CosmosDB is called Change Feed. CosmosDB basically exposes through API the underlying log of changes for the documents in the collections. The changes are persisted, can be processed asynchronously and incrementally, and the output can be distributed across one or more consumers for parallel processing enabling a variety of applications like serving a microservices architecture , alerting in real time, trigger functions to execute a piece of business logic, etc. The change feed at the moment works for updates and writes (deletions are on the roadmap), and exposes only the most recent change corresponding to the item; it means that intermediate changes are not visible.

But, as you may have already guessed, the change feed cannot by itself cover the whole versioning requirement but it plays an important role in the overall solution.

SOLUTION

In short, it is possible to set up, for each collection which items are to be subject to versioning, a secondary or shadow collection, resulting in one that has the latest (and most queried data) and another that has all of the revisions of the data, somehow connected to the first one.

Please enter the Document Versioning Pattern. Let's follow the famous GoF pattern structure to make things easier.

Intent. Ensure that each entity in collections, when updated, maintains the history of the changes
Motivation. It is important to track history of entities throughout their lifecycle
Applicability. As mentioned in the intro to this post, many companies have the need to track document changes for auditing, reporting, and statistical purposes
Structure. The following diagram shows the simple structure for this pattern. Essentially, the key to understanding is that in order to keep the state of objects, every updates has to be turned into an "append" operation. Secondly, a "shadow" container has to be setup to keep the log of the changes. In the example below, the "change" is represented by the whole new version of the document.

Two collections connected via change feed for writes

Participants. The whole mechanism is realised through the change feed to implement a "Materialized View".
Consequences. Typically, this pattern works well when there are not many document revisions (history long) and most of the queries are done on the current version of the document. If these criteria are not met, this pattern might not be the right fit and also may suffer from performance degradation. In fact, if data changes frequently keeping versions will be very write intensive (given that there are two collections to keep in synch).

?A few departing thoughts

Three quick considerations.

First off, there are a few nuances that need to be factored in. For example, what happens with deletion? Shall we delete the whole history? Or can we promote an old version to be the current? These, and a few others, are all questions fundamentally related to the business outcome hence the answer must be drive by that.

Secondly, I have intentionally left out any implementation details and performance considerations. A Gitub repo would do the job. Hopefully soon ;-)

Thirdly, there is the die-hard myth about reusability. It deserves a separate post, and I've already started writing it. However, design patterns are a great level of granularity for reuse, striking a balance between the "too generic" and "too detailed" conundrum.

Last but not least - check out this great post by Andrei who has initiated what I believe is the most meaningful way to "talk business" about NoSQL patterns.

I welcome any ideas about creating a solid, referenceable catalogue of NoSQL patterns. A new literature is possible.

Hans Wieser

Even AI Agents need a memory | Principal Product Manager | Azure Cosmos DB | Advisor, Mentor & Coach

4 年

Love the description Michele! I would argue our versioning is even a bit better than the approach in the picture!

2 次回应

Gopinath Rajee

Data Engineer at Everest Reinsurance Company

4 年

Excellent article. Are the "premiums and claims" applications OLTP applications?

1 次回应

查看更多评论

要查看或添加评论，请登录

Michele Arpaia的更多文章

Digital Platforms: Innovazione e Standardizzazione

2024年5月20日

Digital Platforms: Innovazione e Standardizzazione

Una brevissima riflessione a latere dell'evento (stupendo!) #platmosphere organizzato e diretto da Mia-Platform…

3 条评论
Software is a Knowledge Medium. Forget that at your Peril!

2024年2月19日

Software is a Knowledge Medium. Forget that at your Peril!

First Appeared on my blog: Mens Et Opera (michelearpaia.blogspot.
Software as A Human Activity Between Art and Science

2023年11月22日

Software as A Human Activity Between Art and Science

(Published on my personal blog) In preparation for my new role at VMW last year, I started to brush up on my…
On the peril of confusing Streams and Messages - Part 1

2020年11月8日

On the peril of confusing Streams and Messages - Part 1

Over the last 6 months or so I've been part of many customer and partner workshops where the key topic was to review…

2 条评论
Operational & Analytics workloads - Part #2 CosmosDB & Azure Data Explorer

2020年9月22日

Operational & Analytics workloads - Part #2 CosmosDB & Azure Data Explorer

In the previous post, I tried to illustrate why you could make better business decisions if you get into the habit of…
Operational & Analytics Workloads - Part #1 Convergence

2020年9月16日

Operational & Analytics Workloads - Part #1 Convergence

When you look at the world from a customer perspective, it often offers an opportunity to transcend Conway's law and…
The philosophy of Azure Cosmos DB

2020年6月3日

The philosophy of Azure Cosmos DB

Recently, I have been reflecting on the very inception of products and services, especially from the consumer…
Azure Art

2020年5月18日

Azure Art

Greetings to all from locked down Rome. Azure, the color, has a long history.

2 条评论
NoSQL? No Party!

2020年3月3日

NoSQL? No Party!

The NoSQL market first emerged in 2009, although much of the technologies and concepts have been in existence for at…

1 条评论
The Rationale Behind Marketing Clouds

2016年9月27日

The Rationale Behind Marketing Clouds

Marketing cloud solutions are a fantastic idea. The underlying motive is simple: in a world with an unprecedented and…

See all articles

The Document Versioning Pattern in Azure Cosmos DB

Michele Arpaia

CEO evoila Italy

Michele Arpaia的更多文章

社区洞察

其他会员也浏览了

The #ITAMSAMSLC March Webinar Blog

Weekly API Evangelist Governance (Guidance)

Oracle Enterprise Manager 24 and Autonomous ROI

Streamlining Records Management with UiPath

The Ultimate Guide to Software and Cloud Integration: Technologies, Methods, and Best Practices

Leveraging Mendix to Build Exceptional GRC Solutions: A Comprehensive Guide

Which Is Best For Backups - Manual Or Marketplace App?

Team Dependencies: the Case of the "Fractional SME"

Navigating the Storm: A Double Migration

Beta Systems Software Newsletter - January Edition

Michele Arpaia的更多文章

Digital Platforms: Innovazione e Standardizzazione

Software is a Knowledge Medium. Forget that at your Peril!

Software as A Human Activity Between Art and Science

On the peril of confusing Streams and Messages - Part 1

Operational & Analytics workloads - Part #2 CosmosDB & Azure Data Explorer

Operational & Analytics Workloads - Part #1 Convergence

The philosophy of Azure Cosmos DB

Azure Art

NoSQL? No Party!

The Rationale Behind Marketing Clouds

社区洞察

其他会员也浏览了

The #ITAMSAMSLC March Webinar Blog

Weekly API Evangelist Governance (Guidance)

Oracle Enterprise Manager 24 and Autonomous ROI

Streamlining Records Management with UiPath

The Ultimate Guide to Software and Cloud Integration: Technologies, Methods, and Best Practices

Leveraging Mendix to Build Exceptional GRC Solutions: A Comprehensive Guide

Which Is Best For Backups - Manual Or Marketplace App?

Team Dependencies: the Case of the "Fractional SME"

Navigating the Storm: A Double Migration

Beta Systems Software Newsletter - January Edition