GenAI Weekly — Edition 25
Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs
Stay at the forefront of the Gen AI revolution with Gen AI Weekly! Each week, we curate the most noteworthy news, insights, and breakthroughs in the field, equipping you with the knowledge you need to stay ahead of the curve.
Open AI introduces Structured Outputs in the API
Last year at DevDay, we introduced JSON mode—a useful building block for developers looking to build reliable applications with our models. While JSON mode improves model reliability for generating valid JSON outputs, it does not guarantee that the model’s response will conform to a particular schema. Today we’re introducing Structured Outputs in the API, a new feature designed to ensure model-generated outputs will exactly match JSON Schemas provided by developers.
Generating structured data from unstructured inputs is one of the core use cases for AI in today’s applications. Developers use the OpenAI API to build powerful assistants that have the ability to fetch data and answer questions via function calling(opens in a new window) , extract structured data for data entry, and build multi-step agentic workflows that allow LLMs to take actions. Developers have long been working around the limitations of LLMs in this area via open source tooling, prompting, and retrying requests repeatedly to ensure that model outputs match the formats needed to interoperate with their systems. Structured Outputs solves this problem by constraining OpenAI models to match developer-supplied schemas and by training our models to better understand complicated schemas.
[…]
Under the hood
We took a two part approach to improving reliability for model outputs that match JSON Schema. First, we trained our newest model gpt-4o-2024-08-06 to understand complicated schemas and how best to produce outputs that match them. However, model behavior is inherently non-deterministic—despite this model’s performance improvements (93% on our benchmark), it still did not meet the reliability that developers need to build robust applications. So we also took a deterministic, engineering-based approach to constrain the model’s outputs to achieve 100% reliability.
Constrained decoding
Our approach is based on a technique known as constrained sampling or constrained decoding. By default, when models are sampled to produce outputs, they are entirely unconstrained and can select any token from the vocabulary as the next output. This flexibility is what allows models to make mistakes; for example, they are generally free to sample a curly brace token at any time, even when that would not produce valid JSON. In order to force valid outputs, we constrain our models to only tokens that would be valid according to the supplied schema, rather than all available tokens.
[…]
To do this, we convert the supplied JSON Schema into a context-free grammar (CFG). A grammar is a set of rules that defines a language, and a context-free grammar is a grammar that conforms to specific rules. You can think of JSON and JSON Schema as particular languages with rules to define what is valid within the language. Just as it’s not valid in English to have a sentence with no verb, it is not valid in JSON to have a trailing comma.
Thus, for each JSON Schema, we compute a grammar that represents that schema, and pre-process its components to make it easily accessible during model sampling. This is why the first request with a new schema incurs a latency penalty—we must preprocess the schema to generate this artifact that we can use efficiently during sampling.
While sampling, after every token, our inference engine will determine which tokens are valid to be produced next based on the previously generated tokens and the rules within the grammar that indicate which tokens are valid next. We then use this list of tokens to mask the next sampling step, which effectively lowers the probability of invalid tokens to 0. Because we have preprocessed the schema, we can use a cached data structure to do this efficiently, with minimal latency overhead.
My take on this: At Unstract , we’ve always believed that structured data extraction is one of the top monetizable, horizontal use cases for LLMs today. This is a shot in the arm for this use case.
10 years of Google TPUs
Just over a decade ago, a group of Googlers discovered that the company’s AI compute demand was going to outpace our infrastructure at the time. The discovery came as research teams began thinking seriously about launching speech recognition features at Google’s global scale.
“We did some back-of-the-napkin math looking at how much compute it would take to handle hundreds of millions of people talking to Google for just three minutes a day," Jeff Dean, Google's Chief Scientist, said in an interview. "In today's framing, that seems like nothing. But at the time, we soon realized it would take basically all the compute power that Google had deployed. Put another way, we'd need to double the number of computers in Google data centers to support these new features.”
“We thought surely there must be a better way.”
The team looked at different approaches that existed on the market, but ultimately realized they were not able to meet the sheer demand of even those basic machine learning workloads our products were operating — let alone what might follow in the years to come.
Google's leader realized we were going to need a whole new kind of chip. So, a team that had already been exploring custom silicon designs enlisted Googlers from other machine-learning teams and laid down the framework for what would ultimately be our first Tensor Processing Unit , or TPU.
My take on this: Google has been a pioneer in at-scale AI and TPUs made that possible.
Nvidia reportedly delays its next AI chip due to a design flaw
Nvidia has reportedly told Microsoft and at least one other cloud provider that its “Blackwell” B200 AI chips will take at least three months longer to produce than was planned, according to The Information . The delay is the result of a design flaw discovered “unusually late in the production process,” according to two unnamed sources, including a Microsoft employee, cited by the outlet.
B200 chips are the follow-up to the supremely popular and hard-to-get H100 chips that power vast swaths of the artificial intelligence cloud landscape (and helped make Nvidia one of the most valuable companies in the world ). Nvidia expects production of the chip “to ramp in 2H,” according to a statement that Nvidia spokesperson John Rizzo shared with The Verge. “Beyond that, we don’t comment on rumors.”
When was the last time you saw chip delays made news.
领英推荐
GitHub debuts GitHub Models
From Llama 3.1, to GPT-4o and GPT-4o mini, to Phi 3 or Mistral Large 2, you can access each model via a built-in playground that lets you test different prompts and model parameters, for free, right in GitHub. And if you like what you’re seeing on the playground, we’ve created a glide path to bring the models to your developer environment in Codespaces and VS Code. And once you are ready to go to production, Azure AI offers built-in responsible AI, enterprise-grade security & data privacy, and global availability, with provisioned throughput and availability in over 25 Azure regions for some models. It’s never been easier to develop and run your AI application.
Every piece of software is unique. And likewise, every model is unique in its capabilities, performance, and cost. Mistral offers low latency, while GPT-4o is excellent at building multimodal applications that might demand audio, vision, and text in real time. Some advanced scenarios might require the integration of different modes, such as an embeddings model for Retrieval Augmented Generation (RAG).
With the suite of models, developers will have all the options they need to stay in the flow, experiment more, and learn faster than ever before. And this is just the first wave. In the months ahead, as we approach the general availability of GitHub Models, we will continue to add more language, vision, and other models to our platform.
My take on this: There are some models here that Huggingface might never have.
OpenAI won’t watermark ChatGPT text because its users could get caught
OpenAI has had a system for watermarking ChatGPT-created text and a tool to detect the watermark ready for about a year, reports The Wall Street Journal . But the company is divided internally over whether to release it. On one hand, it seems like the responsible thing to do; on the other, it could hurt its bottom line.
OpenAI’s watermarking is described as adjusting how the model predicts the most likely words and phrases that will follow previous ones, creating a detectable pattern. (That’s a simplification, but you can check out Google’s more in-depth explanation for Gemini’s text watermarking for more).
Offering any way to detect AI-written material is a potential boon for teachers trying to deter students from turning over writing assignments to AI. The Journal reports that the company found watermarking didn’t affect the quality of its chatbot’s text output. In a survey the company commissioned, “people worldwide supported the idea of an AI detection tool by a margin of four to one,” the Journal writes.
Hmm.
OpenAI co-founder John Schulman says he will leave and join rival Anthropic
OpenAI co-founder John Schulman said in a Monday X post that he would leave the Microsoft-backed company and join Anthropic, an artificial intelligence startup with funding from Amazon
The move comes less than three months after OpenAI disbanded a superalignment team that focused on trying to ensure that people can control AI systems that exceed human capability at many tasks.
Schulman had been a co-leader of OpenAI’s post-training team that refined AI models for the ChatGPT chatbot and a programming interface for third-party developers, according to a biography on his website . In June, OpenAI said Schulman, as head of alignment science, would join a safety and security committee that would provide advice to the board. Schulman has only worked at OpenAI since receiving a Ph.D. in computer science in 2016 from the University of California, Berkeley.
[…]
Schulman and others chose to leave after the board pushed out Altman as chief last November. Employees protested the decision, prompting Sutskever and two other board members, Tasha McCauley and Helen Toner, to resign. Altman was reinstated and OpenAI took on additional board members.
[…]
Also on Monday, Greg Brockman, another co-founder of OpenAI and its president, announced that he was taking a sabbatical for the rest of the year .
Could be just politics.
AI Is Coming for India’s Famous Tech Hub
AI is threatening to disrupt most businesses around the world, not just India’s $250 billion outsourcing industry. The outsourcing boom in India over the past few decades created the “getting Bangalore-d” phenomenon in the U.S., often used for Americans who lost their jobs to more affordable Indian talent.
AI’s impact could have big repercussions as the industry employs 5.4 million people, according to tech-industry body Nasscom, and contributes about 8% of the country’s economy. More than 80% of companies in the S&P 500 outsource some operations to India, according to HSBC.
The most vulnerable operations employed more than 1.4 million people in 2021, according to the latest data from Nasscom. A third of these jobs are in call centers. “The prize is to move up the value chain and go after new processes,” said Murugesh.
AI might accelerate trends that have already made the industry less labor-intensive. About a decade ago, companies needed about 27 employees to earn $1 million in annual revenue. That number has now fallen to 21 employees, Nasscom data show.?
That didn’t land very far from yours truly.
If you've made it this far and follow my newsletter, please consider exploring the platform we're currently building: Unstract —a no-code LLM platform that automates unstructured data workflows.
Data Platform Strategy | Lead Product Manager | Microsoft MVP | Community Builder for Data, AI & Product Management | Ex - Kissflow & Syncfusion
3 个月"Generating structured data from unstructured inputs is one of the core use cases for AI in today’s applications" - is a good news. But LLMs analysing structured and unstructured data without 'pretraining' is still not being cracked..