Q*bert Your Inference Infrastructure
Gottlieb‘s classic arcade game Q*bert was first released in 1982.

Q*bert Your Inference Infrastructure

One of my favorite arcade games as a kid was Q*bert. The game play was fairly simple. You played as Q*bert, a two legged long-nosed creature whose goal was to jump from box to box changing the surface color so that ultimately the entire pyramid of boxes was the same uniform color.

Enemies such as spring-loaded snakes, obstacles, and other complexity made the game challenging and ridiculously fun. I couldn't help but think of Q*bert during a strategy call last week where a customer's primary concern for the next 12-18 months was AI application sprawl. But why was this a concern?

A Sea of Black Boxes

It's no surprise that most enterprise organizations start their generative AI journey by deploying off-the-shelf solutions versus building their own. These off-the-shelf solutions are almost always containerized applications that are relatively easy to deploy. Whether it's an internal ChatGPT clone, a contract analysis AI app, developer copilot, or one of many other AI applications, almost all deliver their solution in one of two ways:

  1. Using a model (to include large language model) served from the public endpoint in the public cloud
  2. Using a model served from an endpoint that is built into the containerized application

In scenario #1 some applications allow their customers to change the endpoint of the foundation model, meaning the customer could serve that model themselves behind their own firewalls and governance. To date, many applications hide this capability unless asked and for many solutions this may not be a user-configurable option without a call to tech support.

In scenario #2 the application is often again obfuscating the model in use as well as the inference engine, telemetry, and controls for simplicity's sake.

While there is Q*bert level uniformity in this sea of black boxes it's unfortunately the wrong uniformity for an enterprise organization. This sea of black boxes restricts transparency, reduces operational efficiencies, and ultimately does not empower the IT/AI organization to continue to build operational muscle around properly managing this new era of AI applications.

Why do vendors use these architectures?

Vendors use these black box architectures because there's belief that customers want the most simplified experience, especially when electing off-the-shelf solutions - and they are right. Exposing the model and the serving of the model is a layer that may generate more questions from a customer looking to make a purchase and get started quickly.

Unfortunately, as organizations continue to deploy off-the-shelf AI apps they continue to inject liability into the organization (models running in a black box) often without visibility for IT, compliance, or security.

This is not a shortcoming of the application vendors, per se, as their focus is on attempting to deliver the most simplified user experience possible. However, it is wise for customers to start asking these application vendors to use customer's own centralized inference infrastructure whether public, private, or hybrid in nature. This allows the application vendor to focus on driving maximum value through the solution and affords the customer the ability to maintain control over the artificial intelligence within their organization.

The Value of a Centralized Inference Infrastructure

The sea of off-the-shelf AI applications does not need a hard reboot to align to a proper enterprise strategy of governance, scalability, transparency, and compliance. A centralized inference infrastructure offers:

  • centralized governance over models authorized for use within the enterprise to include which apps and which users have access (role based access control)
  • centralized forecasting for future needs to include additional GPU capacity
  • centralized GPU resources for maximum efficiency of this layer of infrastructure (versus perhaps many underutilized GPUs across an organization)
  • centralized troubleshooting of inferencing performance which is directly related to user experience of the AI apps
  • centralized audit-ability and reporting
  • centralized human oversight

Incorporating Off-the-Shelf AI Apps with a Centralized Inference Infrastructure

This is often as simply as asking. That's it. Most AI app vendors are using an endpoint that is OpenAI compliant so changing the location of an inference endpoint that's serving up a model is relatively easy. This may also be a great question to ask during AI app evaluations, "what if we wanted to use our own inference infrastructure to serve up the models needed for this application?" The pushback may exist but it may also be easily overcome if your inference infrastructure:

  • supports OpenAI compatible endpoints
  • has the ability to scale out in a predictable manner
  • can support a multitude of model types, which will become increasingly important in a multimodal world
  • is running on an enterprise-grade platform
  • enhances the troubleshooting capabilities of the AI app
  • helps the AI app align with existing security, governance, and legal AI guidelines that have been established
  • enhances the ability to identify when to scale out the AI app environment

Conclusion

Many organizations have already started to deploy off-the-shelf AI applications, and it is fairly straightforward to Q*bert them (swap from a black box endpoint to a centrally managed inference endpoint) simply by asking and performing some basic tests. There are options as enterprise organizations think through their own centralized inference architecture to include solutions such as vLLM, AWS Outposts, and Nutanix Enterprise AI. AI apps are in their infancy of adoption and building the proper enterprise inference infrastructure will be key to ensuring organizations can continue to prototype, deploy, manage, and govern these value-generating apps.

要查看或添加评论,请登录

Jason Langone的更多文章

  • Planning for Digital Colleagues in a System of Agents

    Planning for Digital Colleagues in a System of Agents

    Sometimes an article comes across your feed at exactly the right time to become a catalyst for the future. Yesterday…

    5 条评论
  • A Guide to Building Valuable Enterprise AI Partnerships

    A Guide to Building Valuable Enterprise AI Partnerships

    It's hard to believe we are approaching the holiday season and end of 2024. I've had the pleasure of visiting 5+…

    2 条评论
  • The "200" Day

    The "200" Day

    The whirlwind since Nutanix .Next in May, where we unveiled GPT-in-a-Box 2.

    2 条评论
  • The Imperative of On-Prem LLM Capabilities for Enterprise Organizations

    The Imperative of On-Prem LLM Capabilities for Enterprise Organizations

    Over the last few weeks I've found myself in conversations with enterprise organizations who have adopted a hybrid…

    2 条评论
  • Precision through Process: A Framework for Conversing with AI

    Precision through Process: A Framework for Conversing with AI

    This article aims to equip professionals like yourself with the knowledge and tools needed to write effective prompts…

  • The Exponential Power of Generative AI

    The Exponential Power of Generative AI

    Last year I read Rick Rubin's book "The Creative Act: A Way of Being," and wrote about my own creative process which…

    1 条评论
  • Congratulations IT, it's an AI!

    Congratulations IT, it's an AI!

    "The business has decided to deploy an AI app to transform our customer experience. Can you provision some compute…

    2 条评论
  • Automatic Infrastructure for A.I.

    Automatic Infrastructure for A.I.

    Last week's article discussed the critical decision-making process between building custom AI applications or opting…

    3 条评论
  • Deciding on AI: Deploy vs. Build for Enterprise Success

    Deciding on AI: Deploy vs. Build for Enterprise Success

    Deploying vs. Building AI Solutions for Enterprises Enterprises are in the midst of a transformative era, where AI…

    4 条评论
  • GenAI in Business: From Talk to Tangible Action

    GenAI in Business: From Talk to Tangible Action

    In 2019, I wrote about "the art of the possible" criticizing its tendency to stimulate conversation without concrete…

    2 条评论

社区洞察

其他会员也浏览了