Serious about Enterprise LLMs? 11 building blocks to keep in mind.

Serious about Enterprise LLMs? 11 building blocks to keep in mind.

Introduction

ChatGPT amassed 100 million users just two months after launch.? A recent Gartner poll indicates that 75% surveyed organizations are exploring generative AI and Large Language Models (LLMs), with 19% in pilot or production mode already.? The buzz is real.

For enterprises, the real opportunity lies beyond content generation and helping workers save time.? It lies in reimagining problems, making workflows smarter, and unlocking significant economic value.??

Figuring out how to write a prompt is just the tip of the iceberg!? Enterprise use cases require a purpose-built architecture that solves for trust, accuracy, relevance, consistency, security, manageability, adoption, and ROI.?

Tribyl was founded soon after transformer models first came out in 2018. Based on practical experience building Enterprise NLP products (using models like BERT, XLNet, GPT 3+), this article introduces foundational capabilities required to make LLMs a real ‘co-pilot’ for Enterprises.? I’ve previously referred to these capabilities as a “semantic intelligence layer”.???

If you’re a Buyer, this checklist will inform your LLM strategy and vendor evaluation.? If you are a Founder or investor, these insights will prove valuable for your product roadmap and investment theses.? Either way, I’d love to hear your thoughts and feedback!

The Enterprise LLM iceberg, exposed.

The Gartner poll mentioned above goes on to say, “initial enthusiasm for a new technology can give way to more rigorous analysis of risks and implementation challenges.” ? I agree.? Here are 11 building blocks to ensure the current A.I. hype doesn‘t end up in another A.I. winter.?


1. Data lakehouse:? Are you providing users new and actionable intelligence to improve outcomes, or are you replacing dashboards with prompts?? Your dashboard can tell you the current win rates, but can it come up with a plan to drive them higher?

LLMs have the power to answer the WHY behind metrics, and to recommend HOW to improve them.? Doing so requires marrying unstructured (context) data with structured (operational) data.??

For example, to diagnose and improve revenue conversion, it’s important to combine 3 data sources:

  • Your playbook (gathering dust in decks)
  • Voice of the Customer (buried in call recordings, emails, documents, surveys)
  • Operational metrics (silo’ed in martech, CRM, customer success, and product usage tools)

Questions to consider:

  • What holistic data is required for problems you’re solving?? What effort and resources are involved in setting up your own lakehouse?? Will IT oblige??
  • What are the risks of LLM applications using their own lakehouses (e.g., consistency, manageability, security)??


2. Context and Relevance:? Large Language is not the same as YOUR Language! ? Is ‘Security and Compliance” a Use Case, or a step in your procurement process, or both? ? Is ’Customer 360’ a Use Case, and ’Unified Customer Profile’ the enabling product feature?? Or is it the other way around?

Without context, LLMs will hallucinate and create noise, driving mistrust, and a premature ending to your vision.??

Questions to consider:

  • How would you train LLMs on your domain-specific taxonomy and tribal language???
  • Do you see these definitions being set up and managed in multiple LLM applications? What’s the likelihood that User A’s definitions match User B’s???


3. Consistency and Accuracy: If you’ve spent any time writing prompts, you know that the slightest of wording change can lead to big differences in output. Imagine showing up to a meeting where everyone’s got their own version of the truth!

Planning to hire an army of expensive prompt engineers to ensure quality and consistency?? Look no further than the fate of Business Intelligence and the infamous dashboard backlog.? You can’t throw people at the self-service problem.

Questions to consider:

  • How would you scale LLMs to 100s of users, without compromising consistency??
  • What resources will be required to ensure quality across fragmented LLM applications?


4. Transparency and Explainability: ‘Take my word for it’ doesn’t fly when an Executive asks for reasoning behind your analysis.? Unfortunately, LLMs are opaque models that can’t be easily explained.??

To get around this problem, it’s important to separate your organization’s knowledge model from generic language models. ? For example, a knowledge retrieval system (powered by ElasticSearch or a vector database) can accurately identify and annotate call transcripts involving a discussion on the ‘Security and Compliance’ Use case.? An LLM model can then synthesize the discussion.? Users can easily verify the summary by drilling into the source transcripts and skimming the (contextually-relevant) annotations.?

Questions to consider:

  • What if silo’ed tools take different approaches to explanations?? Will users be OK if explanations contradict each other???
  • Should you have a centralized knowledge model for LLM applications?


5. Predictions and metrics:? What outcomes are you “hiring” the LLMs for?? While saving time is a common benefit (e.g., summarizing / generating content), that alone won’t get you promoted, nor will it get your CFO to fund LLM projects.??

Take prospecting emails, for example.? Soon, we’ll be inundated by LLM-generated emails that you can’t tell apart from competition.? What if LLMs could personalize emails by rinsing-repeating messaging that drove the highest win rates last quarter???

To drive such outcomes, LLMs need to sift signal from noise.? They need to distinguish between correlation and causation.? Just because the ‘Security and Compliance’ Use Case got discussed in many deals, doesn’t mean it was pivotal to winning all of them!??

Questions to consider:

  • What metrics are you looking to improve?? What features might determine causation??
  • How will you assemble the training data set?
  • Will silo’ed LLM tools need to be trained individually?? How will you ensure consistency?


6.? Job-specific user experience: ? High-impact LLM use cases are going to manifest as new or reimagined workflows.? To drive adoption, we’ll need job-specific, ‘intelligent UX’ design.??

For example, it’s great that your call recording tool uses LLMs to summarize calls.? That’s still 1000s of summaries, though.? Will Product Marketing have time to review them, and produce a monthly win/loss report?? Can Sales Enablement extract repeatable sales plays for landing and expanding customers?? Can reps use the insights to personalize emails and calls in seconds?

As you think through UX, it’s important to keep root cause analysis in mind.? For example, if your sales enablement content is stale and generic to begin with, summarizing it with LLMs won’t drive adoption and impact.??

Questions to consider:??

  • What gaps and inefficiencies are impacting outcomes today?? Why is that the case???
  • Are you going beyond organizational and tool silos, and reimagining processes?


7. Training and feedback:??

Your business is evolving rapidly.? For LLMs to stay relevant and support daily processes, it’s important that non-technical users be able to train the LLM models easily and safely.??

Examples from the go-to-market world that’ll trigger retraining -- new use cases, personas, product features, solutions, competitors….??

Questions to consider:

  • What special skills are required to train the models?? How long will it take?
  • How much training data is needed?? How would it be generated?
  • How many LLM tools will need to be retrained?
  • Can users spot false positives and negatives easily?? Can they address them?
  • How frequently will the models refresh?? Will changes apply retroactively?


8.? Performance and latency:??

ChatGPT is a single-user experience with usage and prompt size limits.? Enforcing limits for Enterprise applications will end badly for adoption.

Yet, applications can come to a standstill without performance optimization.? Here’s a scenario:

User 1:? What revenue was generated by the ‘Security and Compliance’ use case last quarter?

User 2:? -- 5 minutes later -- ditto, but only for new customers signed last month.

User 3:? -- 10 minutes later -- ditto, but only for the financial services vertical last quarter.

As you can tell, results from User 1’s prompt are sufficient to answer User 2’s and 3’s prompts.? Enterprise LLM tasks can involve dozens of context filters.? So how do we prevent 100s of similar prompts from hitting the data lakehouse over and over?

To ensure reasonable response times and performance, queries must be cached.? The caching logic must be smart enough to parse prompts and serve up cached results, when possible.? The cache must update itself incrementally.

Questions to consider:

  • What caching strategy makes sense for your business??
  • What’s the projected growth in data and usage???
  • What performance SLAs are acceptable?
  • What’s the consequence if each LLM tool pursues its own caching strategy?? (e.g., inconsistent results)


9.? Cost of ownership:??

New use cases, more data sources, growing users, faster performance….can lead to higher cost.? ChatGPT is funded by outside capital.? Cost to serve isn’t the #1 priority.? The same can’t be said of Enterprise LLM applications.? You need to forecast and budget for one-time and ongoing costs.??

Questions to consider:

  • Will the hard costs (compute & storage) grow linearly with usage and business growth??
  • Would you need to hire and grow a team of specialists? (e.g., prompt engineers)
  • How much time will go in ongoing training and maintenance?
  • Can you track and allocate costs appropriately?
  • How will you measure ROI?
  • Could an “LLM sprawl” across tools only make the above harder to manage?


10.? Security and compliance:? Remember BYOD -- Bring your Own Device to work?? Eventually, the CIO clamped down, and it was a win-win:? users got better support, while reducing security risks.??

The same’s going to happen to the current BYOP movement - Bring your Own Prompt.?

A consequence of using LLMs to make workflows smarter is that intelligence will be stitched together from a variety of data sources, including those that users didn’t have prior access to.

Questions to consider:

  • How to enforce user-level security and access controls, consistent with Infosec policy?
  • Can LLMs be made to forget data for compliance reasons???
  • What observability is available to track model usage and behavior?


11.? Future proofing:??

If you’re with me so far, the biggest question to ask is -- how do we solve for the above LLM building blocks in the Enterprise?? Here’s why it matters: over 2/3rds of respondents in the same Gartner poll said they want to use LLMs to drive revenue growth and customer retention.? That corresponds to 1000s of SaaS tools in the marketing, sales and customer success space. Imagine dealing with as many LLM implementation approaches!??

Questions to consider:

  • Should intelligence be fragmented across tools and departments, when the goal is to break down silos???
  • What resources do you have available to manage LLM tools across the enterprise?
  • Would vendor lock-in become a risk?? Should systems of record and systems of engagement be kept separate from a (shared) system of intelligence?

Conclusion

The real promise of large language models lies in surfacing new intelligence and reimagining Enterprise workflows.? Adding shiny LLM features to legacy SaaS tools is barely scratching the surface.??

Implementing this vision requires a purpose-built foundation described above.? We’re calling it the “semantic intelligence layer”??

Where will this layer sit?? In each of the 100s of enterprise tools?? Likely not, for reasons discussed earlier.? We think this is a new category, as it requires a ground-up architecture that plays nice with all tools and data sources (current and future). ? That’s the approach we’ve been taking in building Tribyl.

Who will be the winners and losers because of this disruption?? Will the current hype cycle end in another A.I winter?? It’s still early to tell.? A lot depends on if -- and how fast -- customers and investors make the shift from the SaaS-first playbook they’re used to, to an A.I.-first one.? There are bound to be significant differences in GTM, product, pricing, adoption, fundraising, and exit strategies.?

What do you think???

要查看或添加评论,请登录

社区洞察

其他会员也浏览了