Computing for AI at Scale – Identification 3 of 8
Alongside data, computing makes AI. In terms of value creation, AI Computing happens in the very places discussed in the previous article on Digitalization: In products, services, customer solutions, business processes and business systems. Once AI use cases have been integrated into business operations, rubber meets the road thru computing.
Computing for Inference is about putting AI models into use to generate classifications or predictions based on data, or in the case of generative AI, to generate new data. Inference is the process where AI model is applied to actual data, thereby directly connecting to AI-driven value creation.
But there’s more to the picture of AI computing. A lot needs to happen before those AI models are ready to create value in products, services and processes: AI model Training is the other great AI computing domain. As important as inference but very different by nature.
Training and Inference are equally important AI computing domains but different in nature
AI models, once developed and trained and before inference can take place, need to be deployed across all those digital computing environments and value creation contexts. Ease of AI model deployment, while not about computing per se, creates new set of requirements on computing solutions.
Ease of AI model deployment is elementary aspect of AI computing solutions
AI computing needs
AI model training
AI model training is computing-intensive due to many factors:
- Large-scale data – Training typically involves processing large datasets, often consisting of millions of data points. Each data point contributes to the training process but requires multiple computations.
- High dimensionality – Deep learning models are often based on high-dimensional vector spaces, leading to large number of parameters. Each parameter requires computation during the training process.
- Complex operations – Mathematical operations used in training are computationally heavy, e.g. multiplication of large matrices.
- Iterative process – Training involves many iterations over the dataset, continuously adjusting parameters to minimize loss – defined as the difference between model-generated prediction and the right answer. The goal of AI model training is to minimize this loss, i.e. to improve model's predictions. However, this iterative nature increases the overall computational load.
Consequently, AI model training requires High-Performance Computing (HPC), often involving Graphical Processing Units (GPUs), Tensor Processing Units (TPUs) or other specialized hardware optimized for parallel processing and deep learning. In addition, based on the size of used datasets and complexity of models being trained, these computing resources need to be scalable.
AI model training calls for High-Performance Computing with specialized hardware
In addition, data throughput and I/O effficiency are critical for feeding large volumes of data into models during training process. Sometimes training itself needs to be distributed across multiple nodes to speed up the training process.
Access to large amounts of high-quality data is essential for succesful AI model training. Computing infrastructure must accommodate data storage requirements in addition to computing needs.
Inference
Once AI model training process is complete with results verified, it is time to deploy models to do inference. Depending on the AI use case and model in question, inference may relate to many different things from traditional analytics-driven classification to Machine Learning powered predictions to latest Generative AI based new content creation. The term “inference†applies to all of them. That is, inference is agnostic to the underlying AI technology and its evolutionary step – with an important commonality: reliance of new input data to create results.
Inference as a term is agnostic to underlying AI technology
Compared to training, inference is significantly less resource-intensive – especially for models that have already been optimized during training. The efficiency goal for inference is typically about prioritizing speed and latency over heavy computational tasks associated with training.
Somewhat depending on the use case, the overall inference goals and trade-offs are typically the following:
- Latency – For real-time applications like autonomous systems or financial trading, inference must be performed with minimal latency. However, other use cases such as report generation tolerate much higher latency and can therefore be handled with batch processes.
- Throughput – Some AI use cases like the ones utilizing video input require high-throughput inference capability.
- Scalability – Some AI use cases require ability to scale inference workloads, particularly for applications that need to process large volumes of requests in real-time.
- Resource efficiency – Efficient use of computational resources to balance performance and cost, particularly when deploying models at the Edge or in other resource-constrained environments.
- Model deployment – Seamless AI model deployment mechanisms enable quick updates and scaling of inference across different computing environments.
AI computing solutions
The overall objective for AI computing solutions is to provide a robust, scalable and flexible computing infrastructure that supports the development, deployment and operations of AI models across various use cases, ensuring high performance, security and efficiency.
Serving AI model training and inference needs leads to a mix of cloud, on-premises and edge computing solutions that are to meet different business, technical and regulatory requirements.
Inference in digital environments
Previous article on Digitalization introduced the environments for inference to take place: Natively digital and digitalized products and services, customer solutions, business processes and business systems. Inference related computing is distributed by nature. In contrast, AI model training is centralized by default and does not directly relate to computing environments discussed here.
Digitalization discussed in the previous article connects to inference rather than AI model training
Generalizing, natively digital products and services tend to favor cloud computing, while digitalized physical products and services may lean towards edge and on-premises computing, often integrating with cloud resources. Business processes and systems involving ERP modules can span cloud, on-premises or hybrid environments depending on the specific requirements and prevailing constraints.
Taking a more detailed look, this is how inference typically maps to various computing environments:
Natively digital products
- Cloud – Natively digital products, such as software applications or digital platforms, are often hosted in the cloud. The cloud offers scalability, flexibility and the ability to integrate with various services, which is ideal for natively digital products.
- On-premises – Some natively digital products, particularly those requiring stringent data control, high security or low-latency operations like financial trading platforms and certain healthcare applications may be deployed on-premises.
- Edge – While less common, in specific use cases where low-latency and real-time processing are critical, edge computing could be employed, for example, gaming platforms requiring real-time feedback.
Digitalized physical products
- Edge – Digitalized physical products often rely on edge computing. The edge enables real-time processing, low latency and the ability to operate with intermittent connectivity, e.g. industrial equipment, IoT devices or smart home products.
- Fog – Fog computing, which involves processing data closer to the edge in local or regional data centers, is another common scenario. This approach balances the need for real-time processing with the computational power of centralized resources.
- Cloud integration – While edge and fog computing are predominant, there is often integration with cloud services for data aggregation, analytics or more complex AI computations.
Natively digital services
- Cloud – Natively digital services like SaaS platforms, streaming services or online marketplaces are predominantly cloud-based due to the need for scalability, global accessibility and seamless integration with other cloud services.
- On-premises – Some natively digital services may also be hosted on-premises, particularly in industries with specific regulatory requirements, data sovereignty concerns, or security needs. Alternatively, private cloud may be ideal solution in these cases.
- Hybrid cloud – In some cases, a hybrid cloud model is used to combine the scalability of public cloud with the control and security of on-premises or private cloud infrastructure.
Digitalized services
- On-premises – Digitalized services often rely still on on-premises computing due to their origins in traditional IT environments.
- Cloud – Many digitalized services are transitioning to the cloud, especially as organizations modernize their IT infrastructure. This is particularly common for services that have been adapted to leverage cloud benefits, such as scalability and accessibility.
- Hybrid deployment – Digitalized services may also use combination of cloud and on-premises computing, especially when certain parts of the service cannot be fully transitioned to the cloud due to technical or regulatory constraints.
Business processes
- Cloud ERP – Many modern ERP modules, especially in a postmodern ERP setup, are cloud-based (either public or private). This enables flexibility, scalability and often reduces the need for heavy in-house infrastructure.
- On-premises ERP – Some ERP modules, particularly in industries with strict data security and regulatory compliance needs like finance or healthcare remain on-premises.
- Hybrid ERP – Many organizations use a hybrid ERP model, where some modules like CRM are cloud-based while others like financials and manufacturing remain on-premises, depending on the specific needs and constraints.
Business systems
- Alignment with ERP – Business systems often align with the computing arrangements of their constituent business processes and ERP modules. Corresponding ERP modules can be deployed in the cloud, on-premises or in a hybrid configuration.
- System Integration – Business systems, which often involve complex integrations of multiple ERP modules and other applications, may also leverage middleware or integration Platform as a Service (iPaaS) solutions to ensure seamless computing across different environments.
- Custom Business Systems – In some cases, particularly when a business system involves high degree of customization or integration with non-ERP systems, more flexibility may be needed, potentially involving all three environments: cloud, on-premises and edge.
AI model deployment and portability
A typical scenario: AI model is trained in one environment but for inference it needs to be deployed across various computing environments. For example, AI model training takes place in public cloud and is then deployed in private cloud, on-premises and edge devices. Efficiency lies in the flexible, seamless and easy overall deployment process. A lot hinges on the principles of modularity, code reuse and portability.
Consequently, in the Age of AI, microservices architecture, containerization and container orchestration emerge as cornerstone solutions. Microservices architecture brings modularity while containers and container orchestration provide code reuse and model portability.
In the Age of AI, microservices and containers emerge as cornerstone solutions.
In the context of AI computing, microservices architecture has significant benefits:
- AI models can be packaged as independent services that communicate with other services via APIs. This setup allows different parts of the AI system to be developed, deployed and scaled independently, promoting code reuse and portability.
- By separating AI inference from other components like data ingestion and preprocessing, microservices enable the inference service to be deployed wherever it is needed, whether in the cloud, on-premises or at the edge.
Correspondingly, containerization and container orchestration come with important capabilities:
- Isolation – Containers provide a consistent runtime environment for AI models, ensuring that they run the same way regardless of where they are deployed. This isolation is crucial for maintaining model performance and reliability across different environments.
- Portability – Containers can be easily moved between different environments like cloud, on-premises and edge, facilitating the reuse of AI models and the associated data (products).
- Orchestration – Container orchestration allows for the automated deployment, scaling and management of containerized applications, including AI models, across large clusters of computing resources. Orchestration capability is vital for managing distributed inference across multiple environments. Kubernetes is the go-to solution for orchestrating containerized AI models. It supports automated scaling, load balancing and self-healing capabilities, ensuring that AI inference can scale horizontally across large clusters of machines.
- Edge orchestration – Specialized orchestration tools and platforms like K3s (a lightweight Kubernetes distribution) can be used to manage containers on edge devices, enabling distributed inference close to the data source.
With these solutions in place, AI models can be efficiently deployed, scaled and managed across public and private clouds, on-premises setups and edge devices. The separation of AI model training (typically in the cloud) and inference (distributed across various environments) emphasizes the need for these solutions.
Separation of AI model training and inference emphasizes the need for modularity thru microservices and portability thru containers
Seamless portability also promotes scalability by leveraging different computing resources as needed, whether in the cloud, on-premises or at the edge.
Furthermore, code reuse leads to adaptability thru shifting an AI model from cloud to edge due to e.g. latency requirements.
Modular architecture makes it easier to maintain and update AI models and related components without disrupting the entire system. This is particularly important in environments where frequent updates are necessary, such as in edge computing or real-time applications.
Orchestration combined with high degree of automation leads to cost-efficient operations and reduces risk of errors.
Distributed inference involves deploying AI models across multiple environments to meet the needs of different applications, e.g. low-latency requirements at the edge, scalability in the cloud. In this context, containerization and microservices become essential for managing these distributed deployments efficiently.
Finally, in a Data Mesh architecture, data products encapsulated in containers can be easily shared and reused across different domains and environments. This portability ensures that AI models always have access to the data they need, regardless of where they are deployed. More about this in the next article on Data Integration.
Cloud computing
Cloud computing allows companies to access computing resources like servers and storage over the internet. Rather than maintaining physical data centers themselves, businesses can rely on cloud providers to service their computing needs on-demand and with high degree of flexibility.
Benefits of cloud computing for AI are for everybody to see:
- Scalability – AI models, especially the ones based on some variant of neural networks and deep learning, often require vast amounts of computing resources. Cloud offers scalable infrastructure where companies pay for the resources they use. This scalability is especially useful during peak training or inference times.
- State-of-the-art tools – Cloud providers offer a suite of AI engineering and machine learning services. These tools are often kept up-to-date with the latest advancements, allowing companies to leverage cutting edge capabilities without the overhead of building and maintaining them by themselves.
- Data storage – AI thrives on data. Cloud platforms provide robust, scalable and distributed storage solutions that can handle vast amounts of structured and unstructured data.
Cloud computing comes in three types: public cloud, private cloud and hybrid cloud, each with its own characteristics and merits:
- Public cloud – Public cloud services are available to anyone who wishes to use or purchase them. Resources are shared among multiple organizations, often referred to as tenants. Public clouds offer virtually unlimited scalability, allowing businesses to quickly scale up or down based on computing demand. Public cloud providers operate on pay-as-you-go model, reducing the need for large upfront capital investments. Cloud resources are easily accessible from anywhere with an internet connection.
- Private cloud – Cloud environment that is exclusively used by a single organization. It can be hosted on-premises within the organization’s data center or by a third-party provider. Unlike public cloud, resources are not shared with other organizations. In the context of AI computing, private cloud key benefit is control over data sovereignty and security. In addition, private cloud with its dedicated resources leads to more predictable costs compared to unlimited elasticity of public cloud.
- Hybrid cloud – Hybrid cloud combines elements of both public and private clouds, allowing data and applications to be shared between them. In the context of AI computing, the key hybrid cloud benefit is the flexibility of doing AI model training in public cloud and inference in private cloud. Overall, hybrid cloud comes with best of both worlds by combining public cloud scalability and private cloud control.
A typical strategy is to use public cloud for AI model training and private cloud for inference. Flexibility and vast computational power of public cloud environments make them ideal for handling intensive workloads associated with training large AI models.
Correspondingly, using private cloud for inference makes sense especially when dealing with sensitive data, stringent privacy requirements, or need for low-latency processing closer to data source. This setup helps to ensure that data remains within the organization’s controlled environment and thus enales data sovereignty.
Public cloud used for AI model training and private cloud for inference is a typical strategy
In addition to scalability, better tool availability favours doing AI model training in public cloud. While many tools and frameworks used for AI model training like TensorFlow, PyTorch and Kubernetes are open-source and can be deployed also in private cloud, some advanced tools are only supported on public cloud by respective cloud service providers. Also, access to specialized AI computing hardware may be easier and more cost-effective to do on public cloud rather than private cloud.
On-premises computing
On-premises computing refers to the traditional IT model where an organization owns and manages its own data centers and computing infrastructure within its physical premises. This model involves purchasing, maintaining and operating servers, storage, networking and other IT resources in-house. While cloud computing has gained popularity for its flexibility and scalability, on-premises computing remains a viable option for many organizations, especially those with specific needs or regulatory requirements.
Key characteristics of on-premises computing include:
- Ownership and control – Complete ownership and control over computing infrastructure, including hardware, software and networking resources. IT teams can customize and configure the environment to meet specific business needs, with no dependency on third-party providers. In comparision, private cloud also provides high degree of control and customization but is still managed by a third-party provider or as a private cloud within an organization’s own data center.
- Security and compliance – Data and applications are hosted entirely within the organization's premises, providing higher level of security for sensitive or confidential information. On-premises computing is often preferred in industries with strict regulatory compliance requirements, where data can never leave organization's control.
- Performance and reliability – On-premises systems offer predictable performance as the organization has dedicated resources without the need to share them with other tenants. Companies can optimize and fine-tune their infrastructure for specific workloads, ensuring consistent and reliable performance.
- Customization and flexibility – On-premises computing allows for deeper customization to support unique business processes, legacy applications and specialized workloads. Organizations can use a wide range of hardware and software configurations tailored to their needs, without limitations imposed by external vendors.
- Cost predictability – On-premises computing involves upfront capital investment and predictable operational costs, making it easier to forecast and budget for IT expenditures. For organizations with stable and predictable workloads, investing in on-premises infrastructure can be more cost-effective over the long term compared to ongoing cloud service fees.
While sunk costs and legacy reasons often drive the decision to maintain or even expand on-premises computing, there are also other factors to consider. Business reasons to choose on-premises computing over private cloud may include:
领英推è
- Data sovereignty – On-premises computing can offer a level of assurance that the data is stored locally and complies with specific regulations around data sovereignty and residency.
- Ultra-low latency – For applications that require ultra-low latency, such as high-frequency trading or industrial control systems, on-premises computing might provide faster, more reliable performance due to the proximity of the computing resources. In such cases, even the minimal latency introduced by private cloud networks can be too much.
- High-Performance Computing (HPC) – Certain AI workloads, scientific simulations or other intensive computational tasks might benefit from optimized on-premises HPC. While private clouds can offer powerful computing, the customizability and control over specific hardware configurations may give on-premises computing an edge.
- Resource utilization – Organizations that have the expertise to manage and maintain high levels of resource utilization might find that on-premises computing offers better value, especially when resources can be effectively pooled and managed for maximum efficiency.
- In-house expertise: Some organizations have significant in-house expertise in managing on-premises data centers and may prefer to leverage this expertise rather than retrain or shift to cloud management skills.
Legacy related inertia and sunk costs are not the only reasons to stick with on-premises computing. There are also current business reasons to do so.
Edge computing
As discussed above, digitalized physical products rely on edge computing. Inference done locally at the edge enables real-time processing with very low latency. This is often crucial in e.g. industrial applications and consumer products utilizing AI.
Edge computing operates mainly on transient data for immediate responsiveness. However, that does not exclude the possibility to forward part of the data to cloud to be processed for deeper insight involving permanent or semipermanent batch data.
Edge computing operates mainly on transient data for immediate responsiveness
Intermediate solution between edge device or product and cloud is to process data close to the edge in local or regional data centrer. This is typically called fog computing to differentiate from public or private cloud.
In order to reduce connectivity requirements and related bandwidth usage, edge device or product needs to perform data preprocessing before sending it to cloud (or fog) for further analysis. This greatly improves the overall efficiency of the solution in question. Even then, reliable connectivity remains essential for solution operation.
With three different computing locations in play, system integration emerges as crucial element of the overall solution.
Overall, ability to do inference at the edge, beyond cloud or on-premises computing, significantly adds to the portfolio of high-impact AI use cases.
ERP Strategy in the Age of AI
In the Age of AI, postmodern or composable ERP discussed in the context of Digitalization is not just technological upgrade but a strategic necessity. Legacy ERP systems now represent technical debt that businesses can no longer afford to carry. The move to modular, decentralized and predominantly cloud-based ERP systems supports versatile AI computing needs and thus becomes strategic priority.
Technical debt of legacy ERP systems connects to inflexibility of the monolithic architecture, making it difficult to integrate new technologies, including AI. In addition, they suffer from slow adaptation as these systems were not designed with AI in mind, making it challenging to incorporate AI-driven insights and automation into business processes.
Monolithic legacy ERP systems represent technical debt that has become payable in the Age of AI
By comparison, postmodern/composable ERP comes with several high-impact benefits:
- Modularity – A postmodern or composable ERP system is modular, allowing organizations to integrate tailored or off-the-shelf solutions to specific business needs. This modularity is crucial for AI at Scale, as it facilitates flexible integration of AI models across various business domains and processes.
- Decentralization – By decentralizing control and allowing business domains to choose and implement their own solutions within the ERP framework, companies can foster innovation and agility. Postmodern/composable ERP systems are designed to scale with the organization's needs.
- Agility – These systems allow for rapid adaptation to new technologies and changing business requirements, which is critical in the AI-defined competitive landscape. The ability to quickly integrate new AI models and deploy updates gives organizations significant competitive advantage.
While the general principles of ERP strategy apply broadly, the specific needs and challenges can vary across industries:
- Manufacturing – Manufacturing is likely to benefit significantly from AI at Scale, particularly in areas like predictive maintenance, supply chain optimization and quality control. A modular ERP system is crucial for integrating AI models that enhance these processes. However, many manufacturing companies still rely on legacy systems. The transition to composable ERP may require significant investment but is essential for maintaining competitiveness.
- Retail – Retail business builds on customer experience. AI can be deployed to power personalization, inventory management and demand forecasting. Typical retailer focus is on integrating customer data across multiple channels, requiring flexible ERP system that supports real-time data processing and AI-driven insight.
- Healthcare services – Healthcare has stringent regulatory requirements that must be met. A composable ERP system allows for the integration of AI while ensuring compliance with health regulations. AI is increasingly used for diagnostics, patient care and treatment planning. The ability to incorporate AI use cases with healthcare management systems is essential.
- Financial services – AI has become essential tool to manage risk and detect fraud in financial services. A modular ERP system facilitates flexible integration of AI models across multiple financial services business domains. However, financial industry has high demand for secure data management, making private cloud solutions and robust data governance a priority.
- Utilities – AI can optimize energy distribution, predictive maintenance and customer service for enhanced operational efficiency. Modular ERP system is ideal in supporting these AI-driven improvements. Furthermore, utilities need scalable solutions to manage large amounts of data generated by multitude of IoT devices.
Case NVIDIA AI Enterprise
NVIDIA AI Enterprise provides a comprehensive suite of tools, software frameworks and hardware that connect exceptionally well to AI needs and computing environments discussed in this article. NVIDIA's solution is designed to enable AI model training and inference across public and private cloud, on-premises data centers, and edge environments.
NVIDIA AI Enterprise solution connects to most, if not all themes discussed in this article.
NVIDIA AI Enterprise provides end-to-end tools and frameworks designed to power AI model training, inference and deployment across these environments while leveraging NVIDIA's specialized AI hardware, including GPUs and DGX platforms. The solution aims to simplify and accelerate AI development, making it an ideal case to explore in the context of Computing for AI at Scale.
Here are some solution highlights closest to the themes discussed in this article:
- AI development and deployment framework – Tools for AI model development, training, deployment and inference with full AI model lifecycle support. Libraries like NVIDIA cuDNN, TensorRT and RAPIDS for AI model development. Triton Inference Server to scale training workloads across a wide variety of hardware. AI model deployment across various production environments. Real-time AI inference across all environments.
- Cross-environment AI computing – NVIDIA AI Enterprise runs on all major public cloud providers where companies train AI models using NVIDIA's cutting-edge GPUs. Private cloud deployment provides cloud-like flexibility and scalability but in a dedicated environment with data sovereignty. On-premises option leveraging NVIDIA’s DGX Systems purpose-built for AI model training. AI model deployment to the edge, enabling real-time AI inference close to the data source.
- Access to AI optimized hardware – NVIDIA GPUs such as the A100 Tensor Core GPU are optimized for both AI training and inference, making them the backbone of many AI platforms. Multi-GPU and Multi-Node setups enable distributed training, crucial for handling large AI workloads at scale.
- DGX Platforms – Integrated hardware platforms for AI computing optimized for the most demanding AI training workloads, including large-scale training for advanced AI models such as GPT, multimodal AI, and more.
- NVIDIA Jetson – Powerful yet compact solution for running AI models in resource-constrained edge environments, ideal for autonomous machines, robots and IoT devices.
- AI model integration and deployment – Support for MLOps to automate and manage AI models throughout their lifecycle, with tools to automate model retraining, scaling and monitoring, and ensuring that AI models stay accurate and up-to-date as they interact with real-world data. Cross-platform AI model portability with seamless movement of AI models between cloud, on-premises and edge environments.
Constraints Assessment
Constraints assessment on AI computing is about two things: a) Verifying whether and how AI computing needs in terms of training and inference are being served, and b) Assessing the completeness and maturity of AI computing solutions.
Constraints Assessment deals with AI computing needs and solutions’ maturity
While the complete list of things to consider is somewhat extensive, here are some indicative examples from each assessment area:
- AI model training – Access to specialized hardware (GPUs, TPUs) optimized for AI model training? AI training software frameworks supported and optimized for the hardware? How training infrastructure scales to accommodate larger AI models and datasets? Any apparent constraints on data throughput and I/O?
- Inference – Limitations to deploying inference across cloud, on-premises and edge? Limitations in scaling inference capacity to handle increasing demand from AI-driven applications? Tools for monitoring model performance and ensuring that the inference process is running smoothly and efficiently? Infrastructure optimized for low-latency, high-throughput AI applications?
- AI model deployment – How are AI models deployed and updated on edge devices? How containers are used to package and deploy AI models? How container orchestration tools are used to manage the lifecycle of containerized AI models?
- Public cloud capabilities – What are the cloud platforms used for AI workloads? How access to optimized hardware is organized? How cloud-native AI tools and services are utilized?
- Private cloud capabilities – Are there sufficient compute, storage and networking resources available to support AI workloads? What is the level of AI tool support? How seamless is integration between private and public cloud resources?
- On-premises capabilities – Are there sufficient high-performance computing and data storage resources available for training and inference alike?
- Edge computing – Are the edge devices and servers equipped with sufficient computational power to handle inference tasks? How low-latency requirements for real-time AI applications are met? How reliable is the connectivity between edge devices and the cloud or fog computing data centers?
- ERP system – What is the level of readiness and flexibility in terms of AI model integration? How ERP strategy and roadmap support large-scale AI deployment in long term?
- Security and compliance – How is AI model and data security ensured across different computing environments? Are there any regulatory constraints that affect AI deployment?
Computing Strategy for AI at Scale
AI builds on computing and data. Consequently, AI at Scale strategy focus has to be on serving AI computing needs for training and inference to enable hundreds of highly versatile AI use cases.
Two overarching strategic themes emerge. First, flexibility in terms of arranging the required computing capabilities across various computing environments. Second, with computing emerging as significant cost of doing business in the Age of AI, skillful cost management becomes crucial. It is for computing strategy to find optimum balance between these two.
In addition, data security and sovereignty, and related regulatory and compliance aspects form the third strategic theme alongside computing itself.
Like any strategy, computing strategy is about identifying alternatives and options. After that, it’s about choices and decisions, leading to strategy implementation.
Given the importance of AI-driven value creation, computing strategy emerges as crucial addition to digital and business strategies.
Computing strategy is crucial addition to digital and business strategies
Strategic alternatives and options
AI computing main strategic alternatives and options connect to computing environments, architectural alternatives and AI model deployment. Let’s have a closer look.
Computing environments
As discussed, available alternatives with regards to where to do computing for AI model training and inference include public cloud, private cloud, on-premises and edge. Each alternative offers merits and shortcomings regarding scalability, cost, control and security. Alongside computing itself, needs and requirements related to data are crucial part of the assessment.
Public cloud offers unparalled scalability but heavy computing loads leading to increased cost and, what’s worse, somehat poor cost predictability. Cost unpredictability makes IT budgeting cumbersome and uncertain. Conversely, dedicated computing infrastructure brings predictability and stability to costs.
Overall, public cloud-only strategy does not appear viabe any longer. Instead, flexibility thru hybrid cloud appears as more lucrative alternative – especially when combined with seamless integration and interworking between the two.
Private solutions, both cloud and on-premises data center, provide a way to control costs and prevent data leakage. But as total exclusion of public cloud option is not viable either, hybrid solution with maximum flexibility emerges as the go-to alternative.
Edge computing, on the other hand, offers unbeatable solution for inference – especially with tight latency budgets and large real-time data requirements – and comes with cost predictability. Edge can also be seen as a way to offload cloud computing, useful particularly in manufacturing and industrial contexts.
Overall, AI computing strategy is to accommodate workload-specific needs with freedom to choose where to run learning and inference. That calls for seamless if not effortless AI model portability with extensive automation and quality assurance in place.
AI computing strategy is to accommodate workload-specific needs with flexibility in terms of where to run learning and inference
Ideally, all computing resources would be seen as seamless computing continuum ready to be utilized on a single AI workload basis.
Architectural alternatives
Choice between between monolithic and modular architectures is decisive in terms of AI computing and especially in relation to AI at Scale.
Microservice and postmodern/composable ERP architectures add complexity and require more from the organization but bring vital modularity on the path to flexibility and scalability. Monolithic architectures appear as technical debt comparable to any legacy IT system not equipped for AI integration at scale.
Modular architectures require more from the organization but the upside is significant
Modular architectures are strong candidates to be added to the digital capability build-up roadmap.
Containerization for AI model deployment
In the context of AI computing, containers and container orchestration emerge as somewhat essential organizational capabilities. Not necessarily mandatory but something that makes life in the Age of AI so much easier by enabling seamless AI model and related data (product) portability across computing environments.
With high degree of automation, container orchestration related overheads can be managed with high level quality assurance. A capability worth investing in.
Container orchestration with automation is a key capability
Strategic objectives for digital capabilities: Computing
Earlier article in the series discussed Alignment by setting five strategic objectives to digital capabilities across the board: Scalability, Quality, Speed, Agility and Innovation. Let’s reflect the set objectives from AI computing perspective:
- Scalability – The choice of computing environment and architecture directly affects how well AI initiatives scale up. Public cloud offers virtually limitless scalability and decision to adopt microservices architecture with containerization enables quick and efficient AI application scale-up.
- Quality – Decisions related to model deployment affect the quality and reliability of the integrated AI models. Automated testing and deployment processes combined with robust monitoring help to maintain high quality at scale. MLOps for industry-grade AI operations is the key ingredient.
- Speed – Decisions on containerization and container orchestration have significant influence on the speed at which AI models can be deployed and updated.
- Agility – Adoption of modular architectures allows companies to quickly adapt to changing market conditions and technological advancements, maintaining agility in AI development. Microservices-based AI development enables iterative AI model improvement and swift responses to emerging market opportunities and challenges.
- Innovation – As Agile principles and DevOps have taught us, staying competitive in modern business environment builds on continuous learning thru rapid iteration. In the Age of AI that translates to ability to train and retrain AI models – calling for strong, versatile and flexible computing capabilities. In ideal scenario, investments in computing translate to customer value, differentiation, pricing power, margins and growth thru innovation.
Conclusions
Making informed strategic choices and decisions in relation to computing will have decisive impact on company’s success in achieving AI at Scale and ultimately in the AI-defined competitive landscape.
Like with other digital capabilities, good decisions are needed to eliminate constraints on computing capabilities. However, before elimination comes identification and understanding thru holistic constraints assessment. Computing is no different from the rest.
Before constraints’ elimination comes their identification and understanding
Urgency depends on competitive situation that varies from an industry to another. However, two universatilities apply: First, embedding AI in all aspects of value creation has become the biggest leverage for productivity and competitiveness. Second, as the discussion on computing above shows, building the necessary digital capabilities is not trivial and takes time.
The sooner that journey starts, and the more systematic and disciplined it is, the better. Assessing constraints as they currently are is the natural first step.
Constraints Assessment as a Service
Constraints Assessment as a Service covers all digital capability areas from Strategic Management to Data Culture. See detailed Service Description.
AI at Scale Workshop
AI at Scale workshop is a compact one-day event for business and technology executives and managers. Workshop seeks answers to the question: What should we as a business and as an organization do to secure our success in the Age of AI?
Next article in the series
Article series on Digital Capabilities for AI at Scale continues in the article on Data Integration for AI at Scale.