登录查看更多内容

Don’t make a mesh (unless you have to…)

Chris Pedder

Chief Data Officer @ OBRIZUM | Board advisor | Data transformation leader | Posting in a personal capacity.

发布日期: 2022年3月27日

Apologies for the punny title, it’s a bit clickbaitey, but I want to talk a bit about one of the current hypes in software and data; meshes. In software engineering, meshes are everywhere. The first iteration of this new approach to design came as a logical extension of the software architecture world moving towards micro services. In case you don’t know, microservices is a way of splitting up the components of your software system into atomised pieces, which are responsible for a single piece of the puzzle. Essentially this is object-orientation at the service level.

As an example, imagine want to write a piece of software to do bank transfers. Say we want to:

receive a customer’s request to make a bank transfer
check that it is unique
scan it for indicators of fraud
register the new amount to the person being transferred money
debit money from the account holder’s balance and finally
reply with a success message.

In a monolithic application, each of these steps might be done inside of a different object in the code, but ultimately the code runs as one piece on a single physical or virtual machine. If one part breaks, everything breaks. Inside a micro services architecture, each of these jobs would be handled not by simply a different object, but actually a different “service” which lives independently from the rest of the routines.

Of course, there’s no such thing as a free lunch, so in return for the oft-vaunted advantages of microservices (reliability, scalability, adaptability), we should expect that there’s a downside too. The main overhead is that we must ensure services can communicate?

Fast
Securely.??

In practice, this has meant developers becoming expert in API calls (or other, faster interfaces like gRPC), and also in the extensive use of SSL certificates or API tokens to allow services to communicate securely. Since a lot of these things are (fairly) standardised, this has meant an explosion in the amount of code (mostly boilerplate-ish) which is needed to deploy a service. If only there were a better way…

领英推荐

Migrating from Monolithic to Modular Architecture

David Shergilashvili 2 个月前

How To Test Traffic With A Custom Kubernetes Controller

Keploy ?? 3 个月前

Legacy Code

Hassan Rezk Habib 3 年前

Enter, stage left, the service mesh. It’s really a set of pre-built plumbing (as the name suggests) which you can plug your microservices into. By default, everything is off, so you have to define a set of rules which allow services to communicate with one another. There are two important scaling factors here - the overall number of services, and how many of the other services a particular one is likely to communicate with. It’s a clever idea, and it takes the load off as long as your overall number of services in the mesh is not so few that you could get by just directly making API calls. But there’s another potentially hidden complexity, you want to architect your system such that you don’t need to touch thousands of rules to add or refactor a service. This idea of “weak coupling” is key to good architecture for microservices, so that’s not usually too much of an issue. Worst case scenario, there are some more strongly coupled components, but those couplings are explicit, so you can track them easily, right? That’s what the service layer is all about.

The idea behind a data mesh is essentially similar, once we restate what a service mesh does in a business-oriented way. A service mesh abstracts away the business logic of a micro service (I.e. who it needs to talk to to achieve its aims) from the code itself. The same idea applied to data says “let’s separate the data storage and operational databases from the information we might seek to find within it through analytics”. Since finding meaning in data is still very much a human-level intelligence task (I have seen scant evidence we will automate data analysts any time soon!), this requires that human beings create the rules for what that data represents. As you might have guessed from the outro to my previous paragraph, this is where couplings can become a problem.?

Zhamak Deghani on Data Meshes

Whereas API calls in the service layer can be built to be a priori as uncoupled as possible, the data layer is a more complex beast, and there can be strong but implicit couplings in it. You might think changing the way you collect one variable being slightly changed will have only very local effects in your mesh, only affecting the obviously-correlated components. But you can be wrong in subtle ways, and worse still - there aren’t really good ways to test for that. So you really need diligent, highly data-literate people manning the gates when it comes to defining the components of your data mesh. Just to be clear, I’m absolutely not saying that data meshes don’t work, I’m saying that they are more complex beasts that require more maintenance to look after than service meshes. They’re not for the little guys (like where I work) right now.

This brings me to the latest part of the mesh revolution - the ML/AI mesh. This is essentially viewed as an extension of the service mesh, where services can include interacting ML models. The problem here is that we don’t just have a service layer (where model inference sits) or a data layer (where the model source data comes from), we also have a deep, implicit coupling between the two. Machine learning models have to be trained. So the data flowing in the data layer directly affects the performance of the services in the services layer. To be more explicit, let’s consider what happens when we spot an opportunity to improve model performance. We decide to retrain a root-level ML system, using a different architecture to get a 5% uplift in performance of that model. The new model does better in the task, but it introduces new biases into the models downstream of it, which make them perform worse. So we need to retrain that model too. You can see where I’m going with this - we end up with (potentially) combinatorial complexity each and every time we choose to retrain one piece of the architecture. Scary stuff.

ML Mesh from Google cloud

So, in conclusion: meshes work best in situations where their components are weakly coupled, and coupled only to a few other building blocks. In general, data systems don’t have this property, and so don’t scale in a helpful way when the meshes get large and changes have to be made. Is that the end of the story? Well, to give the classic data scientist answer: it depends. If you can find a way of making your systems weakly coupled, and you can operate in a zone where the combinatorics isn’t too murderous (the Goldilocks zone: not too big, not too small!), then meshes can also work in the data space. They are just a tool, at the end of the day, and we all know that if you find yourself using a rake as a hammer, you’ve probably fallen for the marketing…?

Chris Pedder

Chief Data Officer @ OBRIZUM | Board advisor | Data transformation leader | Posting in a personal capacity.

2 年

Should have said originally, big HT to Debmalya Biswas?for starting me thinking about this!

1 次回应

Debmalya Biswas

AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA

2 年

Thanks Chris, very interesting article - one of the very few articles (that I have seen) trying to bring together the overlapping concepts of API/Services Mesh, Data Mesh & AI/ML Mesh. As you rightly pointed out AI/ML mesh is the most complex as it extends the Service & Data Mesh with "interacting ML models". The combinatorial complexity further increases when we are not only considering a performance improvement of an existing model: "using a different architecture to get a 5% uplift in performance of that model" but a composition of existing services, e.g., a Computer Vision & NLP model, with a new Service layer on top, or reusing their combined training?+ inference?data to train a new model. https://www.dhirubhai.net/pulse/ai-mesh-future-enterprise-debmalya-biswas/ On the +ve side, this would enable maximum reuse & agility in enterprise use-cases. So interesting times ahead I guess :)

查看更多评论

要查看或添加评论，请登录

Chris Pedder的更多文章

Conform to be free.

2023年4月24日

Conform to be free.

As a sometimes awkward, sometimes I’m sure downright frustrating teenager, who just wanted to be, I always remember my…

4 条评论
What is emergence in neural networks?

2023年4月11日

What is emergence in neural networks?

Large language models & emergence. If you’re reading this, I don’t need Bayes’ theorem to tell me that there’s a very…

10 条评论
How to survive ML research

2023年1月21日

How to survive ML research

How (and why?) to stay ahead. I’ve seen numerous articles about how to “stay ahead” in ML research in the last two…

5 条评论
Why “speed” is a bad metric for success.

2022年9月28日

Why “speed” is a bad metric for success.

To start, two aphorisms: “If you want to go fast, go alone. If you want to go far, go together” - African proverb.

3 条评论
Why I love UX/UI as an ML engineer.

2022年5月24日

Why I love UX/UI as an ML engineer.

“There’s a truth, universally accepted, that an AI startup in posession of funding must be in search of good UX…
Building a data company in 2022.

2022年4月20日

Building a data company in 2022.

I've had a pretty varied career in machine learning and software development. I've worked for ten person startups and…

6 条评论
What I learned from my first year in an innovation team.

2021年9月20日

What I learned from my first year in an innovation team.

I have spent the last year as part of Cisco's internal innovation program. As a result, I have read a lot of books and…

3 条评论
What makes NLP hard (and fun).

2020年8月31日

What makes NLP hard (and fun).

So it's 2020, and the much-anticipated AI-powered robot uprising is still very much in the indiscernible mists of the…

1 条评论
The "A" in AI?

2019年11月25日

The "A" in AI?

There’s really only one possible interpretation, and it’s “artificial”, isn’t it? For a long time, people would have…
"Fail fast" vs Machine learning.

2019年4月13日

"Fail fast" vs Machine learning.

Yep, you read that right. There can be only one.

See all articles

Don’t make a mesh (unless you have to…)

Chris Pedder

Chief Data Officer @ OBRIZUM | Board advisor | Data transformation leader | Posting in a personal capacity.

领英推荐

Chris Pedder的更多文章

社区洞察

其他会员也浏览了

TM#007 - Modular (& Shiny) Smart Contracts: EIP-2535 Diamonds

Best API Development Practices for 2025: Key Strategies for Success

Demystifying Software Architectures. A Comprehensive Guide for Golang Developers

Making sense of message queues

Unlocking the Secrets of API Architecture

Kubernetes

On Stranger Tides: API and Container Security Part I

API Design for Scalability and Performance: Best Practices and Pitfalls.

Understanding Kubernetes Liveness Probes: Examples and Use Cases

Kubernetes Affinity

领英推荐

Chris Pedder的更多文章

Conform to be free.

What is emergence in neural networks?

How to survive ML research

Why “speed” is a bad metric for success.

Why I love UX/UI as an ML engineer.

Building a data company in 2022.

What I learned from my first year in an innovation team.

What makes NLP hard (and fun).

The "A" in AI?

"Fail fast" vs Machine learning.

社区洞察

其他会员也浏览了

TM#007 - Modular (& Shiny) Smart Contracts: EIP-2535 Diamonds

Best API Development Practices for 2025: Key Strategies for Success

Demystifying Software Architectures. A Comprehensive Guide for Golang Developers

Making sense of message queues

Unlocking the Secrets of API Architecture

Kubernetes

On Stranger Tides: API and Container Security Part I

API Design for Scalability and Performance: Best Practices and Pitfalls.

Understanding Kubernetes Liveness Probes: Examples and Use Cases

Kubernetes Affinity