登录查看更多内容

Q*bert Your Inference Infrastructure

Jason Langone

Global AI Business Development | AAU GBB?? Coach | Investor | AI Startup Advisor

发布日期: 2024年12月16日

One of my favorite arcade games as a kid was Q*bert. The game play was fairly simple. You played as Q*bert, a two legged long-nosed creature whose goal was to jump from box to box changing the surface color so that ultimately the entire pyramid of boxes was the same uniform color.

Enemies such as spring-loaded snakes, obstacles, and other complexity made the game challenging and ridiculously fun. I couldn't help but think of Q*bert during a strategy call last week where a customer's primary concern for the next 12-18 months was AI application sprawl. But why was this a concern?

A Sea of Black Boxes

It's no surprise that most enterprise organizations start their generative AI journey by deploying off-the-shelf solutions versus building their own. These off-the-shelf solutions are almost always containerized applications that are relatively easy to deploy. Whether it's an internal ChatGPT clone, a contract analysis AI app, developer copilot, or one of many other AI applications, almost all deliver their solution in one of two ways:

Using a model (to include large language model) served from the public endpoint in the public cloud
Using a model served from an endpoint that is built into the containerized application

In scenario #1 some applications allow their customers to change the endpoint of the foundation model, meaning the customer could serve that model themselves behind their own firewalls and governance. To date, many applications hide this capability unless asked and for many solutions this may not be a user-configurable option without a call to tech support.

In scenario #2 the application is often again obfuscating the model in use as well as the inference engine, telemetry, and controls for simplicity's sake.

While there is Q*bert level uniformity in this sea of black boxes it's unfortunately the wrong uniformity for an enterprise organization. This sea of black boxes restricts transparency, reduces operational efficiencies, and ultimately does not empower the IT/AI organization to continue to build operational muscle around properly managing this new era of AI applications.

Why do vendors use these architectures?

Vendors use these black box architectures because there's belief that customers want the most simplified experience, especially when electing off-the-shelf solutions - and they are right. Exposing the model and the serving of the model is a layer that may generate more questions from a customer looking to make a purchase and get started quickly.

领英推荐

Tokens Per Second is Not All You Need

SambaNova Systems 11 个月前

ConnectingAI #77: Architecting the future: Building…

Vijay Nagarajan 4 个月前

Contextual Blinders and Multi-Pass flows for LLM…

瓦希德贾汉吉尔 1 年前

Unfortunately, as organizations continue to deploy off-the-shelf AI apps they continue to inject liability into the organization (models running in a black box) often without visibility for IT, compliance, or security.

This is not a shortcoming of the application vendors, per se, as their focus is on attempting to deliver the most simplified user experience possible. However, it is wise for customers to start asking these application vendors to use customer's own centralized inference infrastructure whether public, private, or hybrid in nature. This allows the application vendor to focus on driving maximum value through the solution and affords the customer the ability to maintain control over the artificial intelligence within their organization.

The Value of a Centralized Inference Infrastructure

The sea of off-the-shelf AI applications does not need a hard reboot to align to a proper enterprise strategy of governance, scalability, transparency, and compliance. A centralized inference infrastructure offers:

centralized governance over models authorized for use within the enterprise to include which apps and which users have access (role based access control)
centralized forecasting for future needs to include additional GPU capacity
centralized GPU resources for maximum efficiency of this layer of infrastructure (versus perhaps many underutilized GPUs across an organization)
centralized troubleshooting of inferencing performance which is directly related to user experience of the AI apps
centralized audit-ability and reporting
centralized human oversight

Incorporating Off-the-Shelf AI Apps with a Centralized Inference Infrastructure

This is often as simply as asking. That's it. Most AI app vendors are using an endpoint that is OpenAI compliant so changing the location of an inference endpoint that's serving up a model is relatively easy. This may also be a great question to ask during AI app evaluations, "what if we wanted to use our own inference infrastructure to serve up the models needed for this application?" The pushback may exist but it may also be easily overcome if your inference infrastructure:

supports OpenAI compatible endpoints
has the ability to scale out in a predictable manner
can support a multitude of model types, which will become increasingly important in a multimodal world
is running on an enterprise-grade platform
enhances the troubleshooting capabilities of the AI app
helps the AI app align with existing security, governance, and legal AI guidelines that have been established
enhances the ability to identify when to scale out the AI app environment

Conclusion

Many organizations have already started to deploy off-the-shelf AI applications, and it is fairly straightforward to Q*bert them (swap from a black box endpoint to a centrally managed inference endpoint) simply by asking and performing some basic tests. There are options as enterprise organizations think through their own centralized inference architecture to include solutions such as vLLM, AWS Outposts, and Nutanix Enterprise AI. AI apps are in their infancy of adoption and building the proper enterprise inference infrastructure will be key to ensuring organizations can continue to prototype, deploy, manage, and govern these value-generating apps.

AI Exec Insights

1,525 位关注者

要查看或添加评论，请登录

Jason Langone的更多文章

Planning for Digital Colleagues in a System of Agents

2024年12月5日

Planning for Digital Colleagues in a System of Agents

Sometimes an article comes across your feed at exactly the right time to become a catalyst for the future. Yesterday…

5 条评论
A Guide to Building Valuable Enterprise AI Partnerships

2024年11月19日

A Guide to Building Valuable Enterprise AI Partnerships

It's hard to believe we are approaching the holiday season and end of 2024. I've had the pleasure of visiting 5+…

2 条评论
The "200" Day

2024年7月26日

The "200" Day

The whirlwind since Nutanix .Next in May, where we unveiled GPT-in-a-Box 2.

2 条评论
The Imperative of On-Prem LLM Capabilities for Enterprise Organizations

2024年5月13日

The Imperative of On-Prem LLM Capabilities for Enterprise Organizations

Over the last few weeks I've found myself in conversations with enterprise organizations who have adopted a hybrid…

2 条评论
Precision through Process: A Framework for Conversing with AI

2024年4月29日

Precision through Process: A Framework for Conversing with AI

This article aims to equip professionals like yourself with the knowledge and tools needed to write effective prompts…
The Exponential Power of Generative AI

2024年4月10日

The Exponential Power of Generative AI

Last year I read Rick Rubin's book "The Creative Act: A Way of Being," and wrote about my own creative process which…

1 条评论
Congratulations IT, it's an AI!

2024年3月28日

Congratulations IT, it's an AI!

"The business has decided to deploy an AI app to transform our customer experience. Can you provision some compute…

2 条评论
Automatic Infrastructure for A.I.

2024年3月22日

Automatic Infrastructure for A.I.

Last week's article discussed the critical decision-making process between building custom AI applications or opting…

3 条评论
Deciding on AI: Deploy vs. Build for Enterprise Success

2024年3月14日

Deciding on AI: Deploy vs. Build for Enterprise Success

Deploying vs. Building AI Solutions for Enterprises Enterprises are in the midst of a transformative era, where AI…

4 条评论
GenAI in Business: From Talk to Tangible Action

2024年3月5日

GenAI in Business: From Talk to Tangible Action

In 2019, I wrote about "the art of the possible" criticizing its tendency to stimulate conversation without concrete…

2 条评论

See all articles

Q*bert Your Inference Infrastructure

Jason Langone

Global AI Business Development | AAU GBB?? Coach | Investor | AI Startup Advisor

A Sea of Black Boxes

领英推荐

The Value of a Centralized Inference Infrastructure

Incorporating Off-the-Shelf AI Apps with a Centralized Inference Infrastructure

Conclusion

AI Exec Insights

1,525 位关注者

Jason Langone的更多文章

社区洞察

其他会员也浏览了

n9 (Nine): The First Distributed Agentic Framework by Nethermind

Get your network future-ready with GenAI, Automation, and Machine Learning

?? BAIT — Patryk Murawiecki's pick (05/11/2024) ??

25 New Videos on ChannelPartner.TV

Why Isn't There a Seamless Client for Switching Between Cloud and Local LLM Environments?

TechPulse by Infera – Edition 2

Bringing AI to Production Without Security Trade-offs: The Power of BYOC

DataStax Partners with Nvidia to Help Enterprises Escape AI ‘Development Hell’

The AI Rollercoaster: Blink and You'll Miss It

Google DeepMind's new ?-VAE uses denoising process instead of decoder

A Sea of Black Boxes

领英推荐

The Value of a Centralized Inference Infrastructure

Incorporating Off-the-Shelf AI Apps with a Centralized Inference Infrastructure

Conclusion

AI Exec Insights

1,525 位关注者

Jason Langone的更多文章

Planning for Digital Colleagues in a System of Agents

A Guide to Building Valuable Enterprise AI Partnerships

The "200" Day

The Imperative of On-Prem LLM Capabilities for Enterprise Organizations

Precision through Process: A Framework for Conversing with AI

The Exponential Power of Generative AI

Congratulations IT, it's an AI!

Automatic Infrastructure for A.I.

Deciding on AI: Deploy vs. Build for Enterprise Success

GenAI in Business: From Talk to Tangible Action

社区洞察

其他会员也浏览了

n9 (Nine): The First Distributed Agentic Framework by Nethermind

Get your network future-ready with GenAI, Automation, and Machine Learning

?? BAIT — Patryk Murawiecki's pick (05/11/2024) ??

25 New Videos on ChannelPartner.TV

Why Isn't There a Seamless Client for Switching Between Cloud and Local LLM Environments?

TechPulse by Infera – Edition 2

Bringing AI to Production Without Security Trade-offs: The Power of BYOC

DataStax Partners with Nvidia to Help Enterprises Escape AI ‘Development Hell’

The AI Rollercoaster: Blink and You'll Miss It

Google DeepMind's new ?-VAE uses denoising process instead of decoder