OpenAI’s GPT-4o Fine-Tuning: A Game-Changer for B2B AI Solutions and the Future of Enterprise Workflows
Enterprise Workflows
On August 20, OpenAI released the fine-tuning capability for its latest model, GPT-4o. This means that fine-tuning is now possible for all GPT models, from GPT-3.5 Turbo to GPT-4 and GPT-4o mini. The training cost for GPT-4o fine-tuning is $25 per million tokens, with an inference cost of $3.75 per million input tokens and $15 per million output tokens. Fine-tuning reduces the need to provide numerous examples in prompts, which lowers costs and speeds up response times.
OpenAI’s support for model fine-tuning is ultimately a strategy to generate more B2B use cases. It’s no surprise that they are highlighting successful GPT-4o fine-tuning cases.
One notable success is Cosine’s AI software engineering assistant, Genie, which achieved top scores on the SWE-bench benchmark.
Genie’s training method is unique. Most AI models rely on trial and error, guessing randomly until the correct answer is found. However, Cosine's CEO, Alistair Pullen , emphasized that if you want the model to behave like a software engineer, you must show it how a software engineer works. By showing Genie real-world examples of how coders solve problems and helping them understand the logic behind each decision, Genie has started to learn how to solve coding problems independently. This is an excellent example of an AI agent.
Through such success stories, OpenAI is promoting the idea that fine-tuning its proprietary models can lead to far superior performance compared to fine-tuning open-source LLMs.
One of OpenAI's B2B-focused offerings is ChatGPT Enterprise. Besides corporate clients like Klarna and Asana, it also has its first federal agency client, the U.S. Agency for International Development (USAID). USAID plans to use ChatGPT Enterprise to reduce administrative burdens on staff and facilitate collaboration with new and local agencies. The use of generative AI through cloud-based SaaS in public institutions handling personal and confidential information might accelerate AI SaaS adoption by other public institutions.
OpenAI also announced a partnership with Condé Nast , a media company that owns well-known outlets such as Vogue, The New Yorker, GQ, Vanity Fair, and Wired . As concerns over hallucination in LLMs grow, the ability to disclose sources of search results becomes increasingly important. When OpenAI officially launches SearchGPT, it seems likely that it will use information from these media outlets as sources when combining conversational models with web information. This is also part of their effort to address copyright issues for better penetration of the B2B market.
According to Sensor Tower, the ChatGPT application has generated approximately $250 million in cumulative in-app purchase revenue since its launch. This figure only includes in-app purchases, so the actual revenue, including web payments, is expected to be higher. However, due to high operating costs, it’s predicted that OpenAI may record a loss of up to $5 billion this year, underscoring the need for a full-scale push into the B2B market in the second half of the year.
Anthropic: Dominating the B2B AI Coding Assistant Market! Anthropic’s Claude 3.5 Sonnet, launched in June, has surpassed GPT-4o in performance, attracting many free users. With the addition of OpenAI co-founder John Schulman to Anthropic on August 7, only two of the 11 co-founders remain at OpenAI.
On August 20, Claude introduced the "Prompt Caching" feature in beta. You might be familiar with the phrase "clearing the cache" when the internet slows down. Prompt Caching is similar, allowing the reuse of prompts across multiple API calls without reprocessing, significantly reducing time and costs for repetitive tasks.
For instance, in the provided example, the entire text of "Pride and Prejudice" is cached using the cache_control variable. By continually reusing this long text, various questions about the book can be answered more quickly and efficiently.
This feature can reduce time and costs when providing lengthy guidelines to conversational agents or answering questions from uploaded documents. It can also enhance auto-completion and Q&A performance in coding assistants by maintaining summaries of relevant sections or codebases in the prompt. Just like in the "Pride and Prejudice" example, it enables users to ask questions about long-form content such as books, papers, or documents. It can also be used as an agent tool for scenarios involving multiple tool calls and repetitive code changes.
Helping users take advantage of long prompts in this way can be particularly attractive to B2B users. This Prompt Caching feature was made possible through collaboration with Zed AI.
Zed AI, based on Anthropic’s Claude 3.5 Sonnet , launched an AI coding assistant service on August 20. Through collaboration with Anthropic, Zed AI now provides powerful and precise tools to experts at the forefront of AI development. The core Rust engineering team at Anthropic actively contributes to the open-source codebase.
While X has launched the Grok-2 beta to freely generate images and Google is trying to secure the voice assistant market through Gemini Live, competition is fierce in the B2B space to secure revenue.
For enterprise customers to effectively use generative AI, it is essential to reduce hallucinations, show transparent and reliable sources, and properly utilize the company’s extensive documents and databases. It seems that the competition among big tech companies will move in this direction.
If 2023 was the year to test generative AI, this year can be seen as the realization phase. As a B2B AI specialist, Allganize is creating industry-specific models through fine-tuning based on open-source models and excels in technologies that help companies effectively use generative AI through RAG.
B2B AI solutions are evolving toward combining LLMs in complex ways to solve problems and create sophisticated and accurate results in the workplace. Allganize’s technology for finding answers in complex tables within company documents outperforms OpenAI’s Retriever in this area.
Allganize’s Ali LLM App Market, which allows the selection and use of various LLMs tailored to our company’s work and the immediate use of over 100 work automation tools, is also evolving toward a full-stack AI tool.
If you are curious about AI-native workflow tools, feel free to contact Allganize here .
Visit www.allganize.ai