登录查看更多内容

OpenAI DevDay Predictions

Daniel Chen

CEO & Co-Founder @ Quilt

发布日期: 2023年11月14日

...and what it means for startup founders

On November 6, 2023, almost one year after the launch of ChatGPT, OpenAI was responsible for the single biggest startup massacre in recent memory. Or was it? As a repeat founder and recovering investor, I have spent my career building and evaluating companies. There is some truth to the hyperbole, but a lot of fiction as well. To better understand the aftermath – the winners, losers, and survivors – we need to first unpack what happened.

What has changed?

So what did OpenAI announce at DevDay? A lot of small improvements that will impact user and developer experience but might not significantly change the AI startup landscape: more consistent JSON outputs, model reproducibility for debugging, multimodal inputs for the Completion API, a new text-to-speech API to complement Whisper ASR, etc. They also announced a few large improvements that will fundamentally impact broad categories of companies, my own (stealth) company included:

GPT-4 Turbo has a massive 128k token context window, which lets you ask questions from entire Harry Potter novels. In addition, it has knowledge up to April 2023, a significant increase from GPT-4’s September 2021 cutoff. With these two changes, ChatGPT is much more useful out of the box as a knowledge assistant.
A new Assistants API with some powerful out-of-the-box tools: a code interpreter, function calling, and retrieval. The first two tools are helpful for extending the capability of agents. The latter makes it significantly easier to do retrieval (think: Google search over some proprietary documents).
Custom GPTs: a low/no-code way to build custom chatbots or personas with your own data, similar to Character.ai.

A very loose summary:

GPT-4 Turbo makes the default chatbot (ChatGPT) way more useful. If you have narrow use cases such as “summarize my meeting notes from previous customer calls to prepare me for the next meeting”, ChatGPT will probably be very good at this. It will be even more powerful because now you can feed things like entire Slack channel dumps. ChatGPT will continue to get better at more one-off tasks.
The Assistants API democratizes RAG (retrieval-augmented generation). If you want to build a simple app to ask questions about the product for the company you are working at, it will be straightforward to come up with a good POC if you’ve taken a couple of coding classes back in the day. Anyone can build question-and-answer apps with a small amount of code (no more explicit embeddings generation, vector store, etc.). You can also make agents do more, but it’s unclear without much more experimentation how much closer we get to useful agents.
Custom GPTs democratize personalized chatbots. If you want to build a knowledge worker (armchair doctor, lawyer, teacher, or psychologist), it should be pretty easy to do with your own data. In the future, OpenAI will become the platform to launch personalized chatbots (currently, that title belongs to Character.ai and to a lesser extent Replika).

DALL-E Prompt: “whimsical picture of gpt4 turbo taking over everything”

What does this mean?

H1 2023, the trend in the enterprise was AI experimentation. Public companies had board-level mandates to figure out an AI strategy that trickled down as a top priority to the rest of the org. Private companies followed suit. Security and ops teams had to fight a deluge of cool new AI tools. Then, entering the back half of 2023, companies started to realize that good demos don’t necessarily mean good products. Buyer excitement waned. A series of AI-related security vulnerabilities put the power back in the hands of security leaders. Now, companies are focused on AI consolidation. They are more keen to explore building over buying. Although incumbents with distribution have successfully launched AI-first features – Ironclad’s Contract AI, Loom’s summarization, Gong’s spotlights, Outreach’s Smart Email Assist, People.ai's SalesAI etc. – new startups have struggled to get traction. Outside of ChatGPT, there is no LLM-first AI app for the enterprise. Willingness to experiment with consumer apps has also waned as we pass the peak of the current AI hype cycle.

With this backdrop, here are three market/buyer predictions that might be helpful to builders:

There will be more enterprise adoption of ChatGPT using GPT-4 Turbo (duh). It will be easy enough to use for one-off use cases. Ops teams will start to make ChatGPT and internal apps the norm, and buyers will have to make a strong workflow justification for a new AI-enabled product. Security teams will become more tolerant of both ChatGPT and products built on OpenAPI without weird workarounds such as Azure migrations or forced multi-tenancy or inferior locally hosted models. Instead of the CSO/CISO being the primary blocker or detractor in the sales process, it will be the COO or revops.
There will be more enterprise experimentation with RAG-based internal applications. It is easy enough to build a private Chrome Extension with a nice UI that can answer from a knowledge base. “Search everywhere” with better answers will no longer be a good enough differentiator.
The subset of people who are excited about personalized chatbots will experiment with Custom GPTs. This will eat into Character.ai’s userbase. However, there will still be room for a go-to premium assistant for popular personas (therapist, relationship coach, lawyer, etc.). As the barrier to launching your own has significantly decreased, it will be a red ocean. The winner will win through brand/marketing and best understanding the end user versus tuning the model in a particular way.

The biggest losers

LLM infrastructure/wrappers and vector databases (LlamaIndex, Langchain, Pinecone, Weaviate). Now with RAG abstracted to some simple API calls over a couple of lines of code via the Assistant API, the value of a wrapper framework and dedicated vector store is unclear for products in the prototyping stage. It is clear that OpenAI wants to own the developer stack and will continue to simplify complex infrastructure. OpenAI will increasingly be used for prototypes and Pinecone and Weaviate’s core user base will get eaten from the bottom up.
Character.ai, Meta AI Characters, Replika, and other customized chatbot platforms. Custom GPTs are a direct competitor.
Single-purpose horizontal applications that thrived because of GPT-4’s limited context window and lack of multimodal support. For instance, the ChatPDF’s of the world.

DALL-E Prompt: “satirical picture of business guys making business decisions about ai”

How do startups adapt?

This paints a bleak picture for AI-first startups. But not all hope is lost. For one, GPT-4 Turbo is 2-6x cheaper than GPT-4 and has significantly higher rate limits, which breathes new life into consumer startups with poor unit-economics who are forced to eat the cost at scale (we’re just waiting for similar cost reductions for DALL-E). In addition, developers have more powerful tools to experiment with more functionality.

For RAG-based companies (think: enterprise search companies like Glean), the proprietary implementation done the right way will outperform Assistant API implementations. The most salient features for the ranking step of retrieval tend to be metadata based on user behavior (thumbs up/down, user activity on a subset of documents, time spent on a particular item or document) – known commonly as “personalization”. You can think of this as “contextual awareness” of the problem: e.g., if you are to look up the answer to a question in your internal wiki, the things that might matter to you might be the time the article was last updated, who wrote the article, and the number of times other people viewed the same article versus just the content alone. This level of data ingestion and feature engineering makes all the difference: even if the model abstraction (Assistant API) is very good, the metadata/”hyperparameter optimization” might make the difference between 50-60% accuracy/relevance and 90%+. Companies built on the Assistant API doing out-of-the-box retrieval will look like POCs versus companies that spend significant effort on retrieval and ranking (think: products built on Heroku vs. products built on AWS or GCP). We will explore this more in another blogpost.

Here are three suggestions for startups:

Gone are the days of getting an enterprise pilot from a weekend hackathon project. The focus should first be on workflows with AI augmentation, not AI for X workflows (e.g., outbound sequences with AI, not fully automated AI outbound). Products will be expected to be 10% AI 90% good ol’ software, not the other way around. Focus on the right workflow that is not easily replicable by an internal app – the more complex (yet generalizable) the better.
Focus on vertical use cases (an end-to-end legal platform), not horizontal technology that is broadly applicable to many potential use cases (again, think PDF chatbots). Horizontal platforms can still work if you focus on the right workflows and/or you are Parker Conrad, you just have to build them the right way. Generally speaking however, it is easier to build differentiated user flows through vertical software, especially with “last mile” integrations like CRM updates. OpenAI is more likely to incorporate horizontal functionality into the base model. And, vertical solutions will be a hell of a lot easier to sell.
Focus on the infrastructure that will lead to a good pilot, not a good demo. For RAG, this means building out a good ranking system versus relying on the Assistant API to provide “good enough” answers. For the cases where the results aren’t good, build a graceful fallback UX so the user is encouraged to give feedback to improve the system. Ignore the tweets talking about how developers have reduced their 100-line ChatGPT wrapper to a 30-line wrapper with the Assistant API.

OpenAI is owning more and more of the AI stack. They are eating the infrastructure layer from the bottom-up: from the model to LLM infra/ops to horizontal use-cases. They are eating the application layer from the top-down: first ChatGPT as an end-user application (both consumer and enterprise), and now platforms and marketplaces to build both consumer (custom GPTs) and enterprise (Assistant API, enterprise-only GPTs) applications. The good news is that, unless AGI is on the horizon, even if OpenAI owns the AI stack, it will likely not own the entire software stack. History doesn’t repeat itself, but it rhymes, and if the Salesforce AppExchange is any indicator, the next wave of AI-enabled enterprise products will be end-to-end applications that integrate with the underlying technology (foundation model, CRM, etc.), not be owned by the platform. In the next 5 years, AI will likely be a foundational part of the software infrastructure stack alongside databases and application servers, but you are in trouble if 90% of your infrastructure stack is AI.

Ultimately, OpenAI DevDay is just another hurdle for some startup founders to overcome, and for others, it was a clear win. It’s a much needed reckoning for the most crowded category since web3. And it’s an opportunity for the best founders to separate themselves from the pack. We are excited about the opportunities that OpenAI DevDay opens up, and are excited to continue to build.

Thanks to Dan Roberts, Michael Graczyk, Lauren Reeder, Florian Juengermann, and Zack Lawryk for your feedback.

Vardhman Jain

Experienced Tech Leader (Startups and Big tech, B2B/B2C)

1 年

Thanks Daniel Chen for an insightful post and your summary from dev-day. Looking forward to reading more on the topic from you.

1 次回应

KIM KHORN LONG

Personal Advisor ranking Minister to H.E. Samdech HUN MANET the Prime Minister of Cambodia/ the Secretary of State of the MoJ / Advisor to H.E. Samdech HUN SEN the President of Supreme Privy Council, Senate, and the CPP

1 年

Thank for sharing. Very informative.

Maxim Strelchenko

Entrepreneur | Tech Optimist

1 年

Really insightful article! Thanks !

1 次回应

Jurgen Gravestein

Conversational AI Consultant at Conversation Design Institute | Author of the newsletter Teaching computers how to talk (3K subscribers)

1 年

Interesting and thorough summary. I'd love to challenge you on the following point: "Character.ai, Meta AI Characters, Replika, and other customized chatbot platforms. Custom GPTs are a direct competitor." I don't really see how GPTs compete with an app like Replika, who have invested heavily in character building and offer a hyperpersonalized, multi-modal experience. It's much more sophisticated than the product OpenAI put out. I could see competitors who want to build an app similar to Replika attempt using the Assistants API.

2 次回应

Yasi Baiani

CEO & Founder @ Raya Advisory - Offering AI & Product Consulting + Recruiting Services

1 年

Terrific reflection and summary Daniel Chen! Especially this point resonated well with me “Products will be expected to be 10% AI 90% good ol’ software, not the other way around.” This is so true! We are still seeing a lot of companies trying to be “AI first” and not knowing what that even means nor can they articulate what real problem they aim to solve.

2 次回应

查看更多评论

要查看或添加评论，请登录

Daniel Chen的更多文章

Speeding up LLM inference

2024年6月20日

Speeding up LLM inference

In computer architecture, there is a concept of speculative execution that speeds up the execution of programs. It does…

2 条评论
The Death of the CRM

2024年6月3日

The Death of the CRM

One of the worst kept secrets in VC and startup circles is that LLMs have fundamentally changed how we interact with…

65 条评论
How does Retrieval-Based Speculative Decoding Improve RAG Performance

2024年4月29日

How does Retrieval-Based Speculative Decoding Improve RAG Performance

A couple of weeks ago, a team from Princeton published a paper around Retrieval-Based Speculative Decoding (REST, don’t…

6 条评论

Daniel Chen的更多文章

Speeding up LLM inference

The Death of the CRM

How does Retrieval-Based Speculative Decoding Improve RAG Performance