Architecting Reliable Platforms for the AI driven future
It’s been a while since my last update, and I’m excited to reconnect with you through this new edition.
During this period of silence, I’ve been deeply engaged in the fascinating world of Artificial Intelligence (AI) and building some complex data platforms. As the initial hype around AI has started to settle, it has given me a valuable opportunity to learn, experiment, and interact with various industry leaders, product owners, consulting partners and innovative product companies.
I’ve been keeping a close eye on the exciting innovations in the AI landscape, including remarkable offerings such as Google’s Gemini, Meta’s Llama3, Anthropic’s Claude3, OpenAI’s GPT-4, and of course, Apple Intelligence.
While it’s still early to predict who will secure a larger market share, my analysis suggests that Nvidia and OpenAI are the new players making significant strides. Other players, meanwhile, seem to be focusing on maintaining their economic moat by enhancing user experiences through AI, at least in the short to medium term. For instance, it’s unlikely that existing O365 customers would switch to Google Apps just because Google's AI Gemini outperforms Microsoft Copilot. Similarly, the introduction of generative art capabilities in Adobe Firefly is unlikely to cause a mass migration from other favored photo editors.
Interestingly, I foresee a potential shift in the search engine market. Google’s dominance may be challenged in the medium to long term as Bing is making impressive progress within the enterprise sector.
On a personal note, I have discontinued my OpenAI subscription and am now fully utilizing WebUI-Ollama-Llama3, Gemma, Mixtral, along with Microsoft Copilot for tasks within our company’s firewall. I’m also encouraging my mentees to integrate Cody and Codellama into their development workflows for routine tasks.
AI system is a three-legged stool
Visualize a robust three-legged stool, where each leg signifies a crucial element. The seat of the stool is a metaphor for the AI system, which depends on all three legs for stability and functionality.
I use the term “foundational” with great care because without high-quality data and a functioning data governance model, no AI project can succeed.
领英推荐
What makes a foundational data platform, great?
The three architectural cornerstones that support any enterprise-grade data platform (or any application, for that matter) are Reliability, Scalability, and Maintainability.
Let’s start with Reliability.
Picture a symphony orchestra, where each musician contributes to a harmonious melody. What happens if one musician hits a wrong note? The symphony doesn’t halt; it continues, albeit with a slight hiccup. This is the essence of system reliability - the system continues to function even when things don’t go as planned. These hiccups, or ‘faults’, are distinct from ‘failures’. A failure is when the entire system ceases to function, while a fault is a dip in performance or an error in one component, but the overall system continues to operate. Building reliable systems is much like orchestrating a fault-tolerant symphony, where the melody persists, despite the occasional off-key note.
Faults can stem from various sources. Hardware faults can occur in disks, memory, Interface cards, power systems, network, CPUs/GPUs, and so on.
For example, the average hard disk has a Mean Time To Failure (MTTF) of 10 years. So, in a system with 10K disks, you can expect about 2.74 disks to fail per day. Reliable systems address this by incorporating redundancy, like RAID configurations. With the advent of virtualization and cloud technologies, we now have a whole new set of parameters to consider for redundancy and fault tolerance.
Software faults could be dormant bugs that only surface under specific conditions (like a Leap Year or World Cup Finals) or newer types of bugs introduced by the AI systems, such as hallucinations. Faults could also arise from end users performing something unusual that wasn’t anticipated during the design phase. As applications are being built as micro-services, there could be cascading errors from even a single insignificant service in the midst of critical flow, much like the butterfly effect.
So, how do we measure Reliability?
I recommend to check this excellent source for further details about these metrics.
In the subsequent editions, I will delve into the other two crucial architectural considerations - Scalability and Maintainability. Stay tuned!
Business IT Senior Program Manager | Business Process Service Delivery Manager | Director - Technology
4 个月Excellent blog…and excellent point, “I use the term “foundational” with great care because without high-quality data and a functioning data governance model, no AI project can succeed.”
Responsible AI & Privacy-Tech Evangelist | Podcast Host | Marathon Pull-Up Athlete
5 个月Reliability, trust and safety are key!