登录查看更多内容

Enterprise Architecture: Architectural Building Blocks for an AI Platform

Guillermo Wrba

Autor de "Designing and Building Solid Microservice Ecosystems", Consultor Independiente y arquitecto de soluciones ,evangelizador de nuevas tecnologias, computacion distribuida y microservicios.

发布日期: 2025年2月24日

It's clear that building a software platform from scratch is not an easy thing, in fact, it's a huge endeavor that involves separate teams to be coordinated together in terms of integration, data management, cross-cutting concerns such as scalability/configuration management, SLA/SLOs, SDLC and software quality, and many many other aspects that turns out the process to be like a "puzzle" with multiple parts moving in different directions.

From an architectural perspective, such a sum of moving pieces needs to be properly controlled and organized, and proper visibility must be given to achieve the expected goals, otherwise, the whole initiative can be jeopardized.

For sure, this is not an exception for building an AI platform, which not only requires the above to be achieved regarding team coordination, synchronization, and common visibility but also to plan for an architecture that can evolve over time efficiently and quickly as a business needs change over time for the organization that makes use of such platform.

As with any other bigger multi-domain initiative, this requires to take into consideration multiple things, but the list below represents a "must", that one cannot get rid of as an enterprise architect, in order to ensure the proper achievement of goals in time:

Ensure AI Models are fully understood and properly mapped to costs: AI Models must become visible from the very beginning, and self-hosted/externally-hosted models must be fully understood and evaluated before taking any decision on which models must be supported, and which ones must be excluded. In this case, a trade-off analysis must be done between AI model capabilities versus costs; costs is not a minor thing here, and in fact, is a crucial factor. Even though nowadays AI models are created day-after-day, with increased capabilities, most of these solutions respond to paid models that are primarily hosted by cloud vendors - typical example, GPT-4, Gemini-1.5. These models fall typically into the category of "closed-source" rather than "open-source". Of course, there are open sourced models such as GPT-3.5-turbo or GPT-4-turbo that can become available for donwloading in the form of GGUF files if you google a bit, however, these models have their own limitations regarding context length, which can represent a constraint if we are looking for model's inference completness and consistency as well as "memorization" of past interactions - all these are largely affected by the context length limitatio

When starting up, less is more: start with few,smaller ,self-hosted, open-source models that can be hosted locally can be better than just bringing everything from scratch, which can be counterproductive from the running costs perspective. Models such as openai/gpt-3.5-turbo and openai/gpt-4-turbo can help to bring interesting AI capabilities into the platform to start with, so that users can start experimenting. Experimentation "builds" the experience and paves the road for further platform refinement and product evolution. As the platform sees some ROI, investment could be done on acquiring more refined, specialized, externally-hosted models to bring additional AI capabilities including video and image processing, complex tabular data processing, image recognition, and others.

I already introduced and explored the concept of "efficiency-driven iterative model tuning approach" in a previous article where this concept around evolving from smaller, less coss-intensive models towards more evolved fine-tuned models as the solutions turn out to be mature, is fully explained

Build a strong, solid Content Pipeline: this is a crucial point. In a typical AI Platform, the content pipeline is responsible for the overall ingestion of pieces of information in multiple formats, further extraction and embedding into the knowledge database (KDB). A knowledge database stores the embeddings and the reference to the data, so that an LLM model can perform RAG and LoRA efficiently. RAG stands for retrieval-augmented-generation, a technique to provide an LLM with additional information on-demand, rather than having to re-train a model with new information, providing high flexibility. There are three key aspects when designing a sustainable content pipeline: scalability, performance and data consistency. The pipeline must be able to scale based on demand; ingestion of thousands of documents must be parallelized ,and content must be chunked so that individual segments of data can be processed simultaneously; parallelization of work is a must across all content pipeline phases, from ingestion, through extraction and final data embedding and this includes especially the data store design since data stores can easily become the "bottleneck". It's important to make sure that no backpressure is introduced, and that all data flows seamlessly as demand increases.

The content pipeline represents a core component that is critical for a successful AI Plarform implementation, since it guarantees that RAG capabilities are available on-demand for the platform for various data taxonomies, including text, tables and OCR images across various document formats. A content pipeline that lags behind due to performance issues lead to a broken user experience because data does not become available whenever needed.

Avoid "Siloed"-style kind of work: siloed team work can lead to siloed solutions across platform domain tracks, especially considering the larger size of an AI Platform. Teams can be distributed geographically with teams working even on different time zones while building the platform. It's of high importance to have a common visibility about what is being built, and that architecture guidelines are shared and evangelized across individual architects so that the product becomes more stable. Architectural governance here is the key, and there must be alignment about common decisions and NFRs.

As a matter of fact, a siloed architecture is not bad on itself, but it depends on the nature of the problem being solved. In a typical AI platform, being a distributed architecture by nature, with multiple moving interconnected pieces, a siloed approach and architecture leads to more problems rather than solutions, and must be avoided.

Mix-and-Match between microservices and semi-monolithic architecture: while microservices are a good approach, going with semi-monlithic architecture for the platform components can be better choice ,especially when trying to deal with coarse-grained functionality that can be borrowed from already existing OSS semi-monolithitc solutions, that, once combined together can achieve the expected functionality. For business-specific fine-grain domain logic microservices are the way to go because of their inherent flexibility. Typical OSS examples are Apache Airflow, Camunda and KubeFlow which are inherently, workflow management and workload orchestration solutions that can interoperate at different levels.

Yes, microservices can be leveraged ,at some extent, however, the overall architecture vision must not be completely tied to a microservice architecture, and instead, must be thought as a combination of smaller pieces that need to talk each other. Whether these pieces are coarse-grained or fine-grained, must fit together and be able to be replaced if needed.

Safety Guardrails / Security: security is a big topic in the AI sphere. Models must be protected against intrusion and security exploits that may compromise the AI integrity. Sophisticated, AI-directed attacks can easily take models down and start behaving erratically, leading to hallucination, and affecting the C-A-C (completeness, accuracy, consistency) quality attributes. The security topic can be broader, and cover multiple things from SDLC and CAS in the pipeline, implementing border protection at the platform ingress layer - using specialized services to analyze traffic and detect potential threats before they occur - , to prompt monitoring and prompt poisoning counter-measures, as well as hallucination-detecting techniques that can help to detect this kind of situation sooner rather than later. One important source of information regarding security and risk managements is the Risks and Management Framework (RMF) , published by the NIST .
Always adhere to industry standards such as OpenAI's API spec: when it comes to designing and building platform APIs that interact with the models, it's advisable to make use of industry standards, and one good example is OpenAI API Spec. By using standard APIs we ensure that our architecture can have movable pieces that can be replaced if needed while keeping the ability to "talk" with other components because standard contracts are in place. This is especially true when designing the deep platform layers (MLOPs) that interact directly with the living running models and their implementation.

Encourage to use standard building blocks for arch blueprints: having a vision on a set of common building blocks is crucial to ensure that we build AI solutions using the proper AI language and ontology. One way to achieve these goals is to ensure that solution architects use a standard set of building blocks to describe the blueprints and high-level architecture diagrams for individual domains. By providing a common set of building blocks, we also provide common cohesion across architects to use the same language. The below diagram depicts a set of building blocks i dused to ensure that the architecture follows a common approach when ideating and creating the architecture artifacts including diagrams and blueprints

Think about evolutionary, flexible architecture: A platform is rather a "living" thing than a static solution. Pieces can move and get replaced as new functionality gets added or changed over time, as business needs may change. This is particularly true in the AI field, with emerging technologies and new frameworks/libraries becoming available too frequently. The architecture must be planned in order to be able to incorporate changes with enough flexibility - changes must not be disruptive and should be done by simply evolving the existing solution and/or by changing components. In this sense, such an AI Platform is not an exception, and the platform itself must be designed adhering to an evolutionary architecture pattern, - "mosaic" architecture - for which changes can be accommodated with ease, rather than spending too much time on changing smaller functionality, typical of more tight-coupled designs for which changing components can result in higher effort.
Keep an eye on TCO and operational / running costs: getting insights about running costs for the whole platform must be a high priority from moment zero, especially because of the high resource usage that is associated with hosting and running models both in the on-premises and especially, on the cloud. Keeping track of the money invested while building the platform ensures that the project runs within the budget from a FinOps perspective. Run costs - especially for emerging AI projects - can become a surprise if not taken seriously, and if there's no previous expertise on the field. Cloud vendors apply different cost policies to provided services, that can be based on different criteria of usage, such as CPU/GPU utilization, number of tokens spent, or even costs can run by usage time directly, or number of connected users. It's important to fully understand the price model in place before onboarding new models or services into the AI Platform that we are building. Cost forecasting, in this sense, helps by having some running costs numbers available sooner, before the running costs impact directly on the face of the business. Some industry standards implementing cost forecasting are FOCUS and OPENCOST. FOCUS enables to transmit of standardized time-series data over-the-wire regarding cost utilization from various cloud vendors that can be used to create specialized dashboards to gain enough insights about how a platform contributes to overall TCO.

Yes, Here, one more time, the concept around avoid burning all the budget on initial stages still applies here, nothing new under the sun. Ultimately, any AI initiative falls under business directions and is driven ultimately, by the business needs. The main goal must be to have value proven initially by bringing some business value that can be then captured by the stakeholders and translated into a quick ROI. At the very beginning we want to keep TCO lower, so that we can have more margin for ROI; as the platform evolves over time, and once expectations are more clear, we can have a larger TCO margin, at expense of increased ROI over time.

Alejandro Perez

Technical Lead, Engineering en GlobalLogic Latinoamerica

3 天前

Excelente articulo, ?gracias! Un punto que va tomando dificultad con el tiempo es la discusion de costos esperados vs realidad. Todos los puntos que tocaste necesitan de equipos especializados, licencias, hardware (o instancias cloud) en los que se necesita invertir para que funcione todo como se espera, pero esto muchas veces es tapado por las constantes noticias de IA, casos de exitos y reduccion de costos que no hacen mas que meter ruido en las discusiones que realmente importan.

Juan Alfredo Rella

Software Solutions Architect at GlobalLogic Latinoamérica & Practice Expert

1 周

Gracias por compartir

1 次回应

查看更多评论

要查看或添加评论，请登录

Guillermo Wrba的更多文章

Enterprise Architecture: Assessing IT organization maturity with ITMAF

2025年2月26日

Enterprise Architecture: Assessing IT organization maturity with ITMAF

Sometimes, when involved on digital transformation engagements and pre-sales activities, and especially during the…

1 条评论
Efficiency-driven Iterative Model Tuning Approach: how to tune AI Models efficiently while keeping the running costs controlled

2024年10月2日

Efficiency-driven Iterative Model Tuning Approach: how to tune AI Models efficiently while keeping the running costs controlled

AI technology as of today are growing at an exponential rate, as faster adoption occurs around AI , generative AI…
AI Content Engineering and Complex Tabular Data Processing

2024年5月27日

AI Content Engineering and Complex Tabular Data Processing

It's not a surprise that an AI Models can be trained to be able to process data in various formats and adhering to…

3 条评论
Continuous Streaming Data Ingestion in AI Models

2024年2月27日

Continuous Streaming Data Ingestion in AI Models

In this second article about AI Architecture, i'm going to cover the concept of continuous data streaming, in the…
Exploring Generative AI LLMs using OpenAI API

2024年1月22日

Exploring Generative AI LLMs using OpenAI API

As i promised, here's the full article i prepared on how to leverage existing large language models available from…

2 条评论
Moving Mission-Critical SQL Server Workloads Effectively to Cloud with AWS DMS

2023年7月11日

Moving Mission-Critical SQL Server Workloads Effectively to Cloud with AWS DMS

Quite recently, i was involved on driving a big data migration project following a lift-and-shift approach to move one…
Architecture Deliverables Governance Framework [ AGDF ]

2023年5月10日

Architecture Deliverables Governance Framework [ AGDF ]

Requirements Management Document Purpose The goal of this document is to set a framework on how to handle the…
ML/AI, un hijo de modelos de regresion matematica

2023年4月6日

ML/AI, un hijo de modelos de regresion matematica

Modelos de Regresion Matematico Hoy dia escuchamos el termino "AI" o "Artificial Intelligence" practicamente a diario…
Leveraging Event Multiplexing in Even-Driven Architectures

2023年3月26日

Leveraging Event Multiplexing in Even-Driven Architectures

Implementing Event Multiplexing in EDA In order to deal with multiplexing, microservices should be capable of…
Persistent Storage in AWS Cloud-Native Microservices with AWS FSX Lustre

2022年7月14日

Persistent Storage in AWS Cloud-Native Microservices with AWS FSX Lustre

One characteristic of any modular microservice development is that , microservices are stateless by nature, which means…

See all articles

Guillermo Wrba的更多文章

Enterprise Architecture: Assessing IT organization maturity with ITMAF

Efficiency-driven Iterative Model Tuning Approach: how to tune AI Models efficiently while keeping the running costs controlled

AI Content Engineering and Complex Tabular Data Processing

Continuous Streaming Data Ingestion in AI Models

Exploring Generative AI LLMs using OpenAI API

Moving Mission-Critical SQL Server Workloads Effectively to Cloud with AWS DMS

Architecture Deliverables Governance Framework [ AGDF ]

ML/AI, un hijo de modelos de regresion matematica

Leveraging Event Multiplexing in Even-Driven Architectures

Persistent Storage in AWS Cloud-Native Microservices with AWS FSX Lustre