Enterprise Architecture: Architectural Building Blocks for an AI Platform

Enterprise Architecture: Architectural Building Blocks for an AI Platform

It's clear that building a software platform from scratch is not an easy thing, in fact, it's a huge endeavor that involves separate teams to be coordinated together in terms of integration, data management, cross-cutting concerns such as scalability/configuration management, SLA/SLOs, SDLC and software quality, and many many other aspects that turns out the process to be like a "puzzle" with multiple parts moving in different directions.

From an architectural perspective, such a sum of moving pieces needs to be properly controlled and organized, and proper visibility must be given to achieve the expected goals, otherwise, the whole initiative can be jeopardized.

For sure, this is not an exception for building an AI platform, which not only requires the above to be achieved regarding team coordination, synchronization, and common visibility but also to plan for an architecture that can evolve over time efficiently and quickly as a business needs change over time for the organization that makes use of such platform.

As with any other bigger multi-domain initiative, this requires to take into consideration multiple things, but the list below represents a "must", that one cannot get rid of as an enterprise architect, in order to ensure the proper achievement of goals in time:

  • Ensure AI Models are fully understood and properly mapped to costs: AI Models must become visible from the very beginning, and self-hosted/externally-hosted models must be fully understood and evaluated before taking any decision on which models must be supported, and which ones must be excluded. In this case, a trade-off analysis must be done between AI model capabilities versus costs; costs is not a minor thing here, and in fact, is a crucial factor. Even though nowadays AI models are created day-after-day, with increased capabilities, most of these solutions respond to paid models that are primarily hosted by cloud vendors - typical example, GPT-4, Gemini-1.5. These models fall typically into the category of "closed-source" rather than "open-source". Of course, there are open sourced models such as GPT-3.5-turbo or GPT-4-turbo that can become available for donwloading in the form of GGUF files if you google a bit, however, these models have their own limitations regarding context length, which can represent a constraint if we are looking for model's inference completness and consistency as well as "memorization" of past interactions - all these are largely affected by the context length limitatio

  • When starting up, less is more: start with few,smaller ,self-hosted, open-source models that can be hosted locally can be better than just bringing everything from scratch, which can be counterproductive from the running costs perspective. Models such as openai/gpt-3.5-turbo and openai/gpt-4-turbo can help to bring interesting AI capabilities into the platform to start with, so that users can start experimenting. Experimentation "builds" the experience and paves the road for further platform refinement and product evolution. As the platform sees some ROI, investment could be done on acquiring more refined, specialized, externally-hosted models to bring additional AI capabilities including video and image processing, complex tabular data processing, image recognition, and others.

I already introduced and explored the concept of "efficiency-driven iterative model tuning approach" in a previous article where this concept around evolving from smaller, less coss-intensive models towards more evolved fine-tuned models as the solutions turn out to be mature, is fully explained
AI Model Magic Quadrant

  • Build a strong, solid Content Pipeline: this is a crucial point. In a typical AI Platform, the content pipeline is responsible for the overall ingestion of pieces of information in multiple formats, further extraction and embedding into the knowledge database (KDB). A knowledge database stores the embeddings and the reference to the data, so that an LLM model can perform RAG and LoRA efficiently. RAG stands for retrieval-augmented-generation, a technique to provide an LLM with additional information on-demand, rather than having to re-train a model with new information, providing high flexibility. There are three key aspects when designing a sustainable content pipeline: scalability, performance and data consistency. The pipeline must be able to scale based on demand; ingestion of thousands of documents must be parallelized ,and content must be chunked so that individual segments of data can be processed simultaneously; parallelization of work is a must across all content pipeline phases, from ingestion, through extraction and final data embedding and this includes especially the data store design since data stores can easily become the "bottleneck". It's important to make sure that no backpressure is introduced, and that all data flows seamlessly as demand increases.

The content pipeline represents a core component that is critical for a successful AI Plarform implementation, since it guarantees that RAG capabilities are available on-demand for the platform for various data taxonomies, including text, tables and OCR images across various document formats. A content pipeline that lags behind due to performance issues lead to a broken user experience because data does not become available whenever needed.


Demonstration AI Content Pipeline

  • Avoid "Siloed"-style kind of work: siloed team work can lead to siloed solutions across platform domain tracks, especially considering the larger size of an AI Platform. Teams can be distributed geographically with teams working even on different time zones while building the platform. It's of high importance to have a common visibility about what is being built, and that architecture guidelines are shared and evangelized across individual architects so that the product becomes more stable. Architectural governance here is the key, and there must be alignment about common decisions and NFRs.

As a matter of fact, a siloed architecture is not bad on itself, but it depends on the nature of the problem being solved. In a typical AI platform, being a distributed architecture by nature, with multiple moving interconnected pieces, a siloed approach and architecture leads to more problems rather than solutions, and must be avoided.

  • Mix-and-Match between microservices and semi-monolithic architecture: while microservices are a good approach, going with semi-monlithic architecture for the platform components can be better choice ,especially when trying to deal with coarse-grained functionality that can be borrowed from already existing OSS semi-monolithitc solutions, that, once combined together can achieve the expected functionality. For business-specific fine-grain domain logic microservices are the way to go because of their inherent flexibility. Typical OSS examples are Apache Airflow, Camunda and KubeFlow which are inherently, workflow management and workload orchestration solutions that can interoperate at different levels.

Yes, microservices can be leveraged ,at some extent, however, the overall architecture vision must not be completely tied to a microservice architecture, and instead, must be thought as a combination of smaller pieces that need to talk each other. Whether these pieces are coarse-grained or fine-grained, must fit together and be able to be replaced if needed.

  • Safety Guardrails / Security: security is a big topic in the AI sphere. Models must be protected against intrusion and security exploits that may compromise the AI integrity. Sophisticated, AI-directed attacks can easily take models down and start behaving erratically, leading to hallucination, and affecting the C-A-C (completeness, accuracy, consistency) quality attributes. The security topic can be broader, and cover multiple things from SDLC and CAS in the pipeline, implementing border protection at the platform ingress layer - using specialized services to analyze traffic and detect potential threats before they occur - , to prompt monitoring and prompt poisoning counter-measures, as well as hallucination-detecting techniques that can help to detect this kind of situation sooner rather than later. One important source of information regarding security and risk managements is the Risks and Management Framework (RMF) , published by the NIST .
  • Always adhere to industry standards such as OpenAI's API spec: when it comes to designing and building platform APIs that interact with the models, it's advisable to make use of industry standards, and one good example is OpenAI API Spec. By using standard APIs we ensure that our architecture can have movable pieces that can be replaced if needed while keeping the ability to "talk" with other components because standard contracts are in place. This is especially true when designing the deep platform layers (MLOPs) that interact directly with the living running models and their implementation.

  • Encourage to use standard building blocks for arch blueprints: having a vision on a set of common building blocks is crucial to ensure that we build AI solutions using the proper AI language and ontology. One way to achieve these goals is to ensure that solution architects use a standard set of building blocks to describe the blueprints and high-level architecture diagrams for individual domains. By providing a common set of building blocks, we also provide common cohesion across architects to use the same language. The below diagram depicts a set of building blocks i dused to ensure that the architecture follows a common approach when ideating and creating the architecture artifacts including diagrams and blueprints



  • Think about evolutionary, flexible architecture: A platform is rather a "living" thing than a static solution. Pieces can move and get replaced as new functionality gets added or changed over time, as business needs may change. This is particularly true in the AI field, with emerging technologies and new frameworks/libraries becoming available too frequently. The architecture must be planned in order to be able to incorporate changes with enough flexibility - changes must not be disruptive and should be done by simply evolving the existing solution and/or by changing components. In this sense, such an AI Platform is not an exception, and the platform itself must be designed adhering to an evolutionary architecture pattern, - "mosaic" architecture - for which changes can be accommodated with ease, rather than spending too much time on changing smaller functionality, typical of more tight-coupled designs for which changing components can result in higher effort.
  • Keep an eye on TCO and operational / running costs: getting insights about running costs for the whole platform must be a high priority from moment zero, especially because of the high resource usage that is associated with hosting and running models both in the on-premises and especially, on the cloud. Keeping track of the money invested while building the platform ensures that the project runs within the budget from a FinOps perspective. Run costs - especially for emerging AI projects - can become a surprise if not taken seriously, and if there's no previous expertise on the field. Cloud vendors apply different cost policies to provided services, that can be based on different criteria of usage, such as CPU/GPU utilization, number of tokens spent, or even costs can run by usage time directly, or number of connected users. It's important to fully understand the price model in place before onboarding new models or services into the AI Platform that we are building. Cost forecasting, in this sense, helps by having some running costs numbers available sooner, before the running costs impact directly on the face of the business. Some industry standards implementing cost forecasting are FOCUS and OPENCOST. FOCUS enables to transmit of standardized time-series data over-the-wire regarding cost utilization from various cloud vendors that can be used to create specialized dashboards to gain enough insights about how a platform contributes to overall TCO.

Yes, Here, one more time, the concept around avoid burning all the budget on initial stages still applies here, nothing new under the sun. Ultimately, any AI initiative falls under business directions and is driven ultimately, by the business needs. The main goal must be to have value proven initially by bringing some business value that can be then captured by the stakeholders and translated into a quick ROI. At the very beginning we want to keep TCO lower, so that we can have more margin for ROI; as the platform evolves over time, and once expectations are more clear, we can have a larger TCO margin, at expense of increased ROI over time.


Alejandro Perez

Technical Lead, Engineering en GlobalLogic Latinoamerica

3 天前

Excelente articulo, ?gracias! Un punto que va tomando dificultad con el tiempo es la discusion de costos esperados vs realidad. Todos los puntos que tocaste necesitan de equipos especializados, licencias, hardware (o instancias cloud) en los que se necesita invertir para que funcione todo como se espera, pero esto muchas veces es tapado por las constantes noticias de IA, casos de exitos y reduccion de costos que no hacen mas que meter ruido en las discusiones que realmente importan.

回复
Juan Alfredo Rella

Software Solutions Architect at GlobalLogic Latinoamérica & Practice Expert

1 周

Gracias por compartir

要查看或添加评论,请登录

Guillermo Wrba的更多文章