GenAI Weekly — Edition 26
Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs
Stay at the forefront of the Gen AI revolution with Gen AI Weekly! Each week, we curate the most noteworthy news, insights, and breakthroughs in the field, equipping you with the knowledge you need to stay ahead of the curve.
Deep Live Cam: Real-time face swapping with just one input photo
One hallmark of more recent AI models has been that the need for training is practically just not there. In technical terms, this is called zero-shot inference. These “pretrained” models know so much about the world in general that they need very little new training or varied input. Deep Live Cam , an open source project is both powerful and scary at the same time. See Also: Deep-Live-Cam goes viral, allowing anyone to become a digital doppelganger
The AI summer
People forget this now, but the iPhone took time as well. Apple sold just 5.4m units in the first 12 months, and it took until 2010 for sales to really work (the iPod took even longer). The same, of course, applies to the enterprise. If you work in tech, cloud is old and boring and done, but it’s still only a third or so of enterprise workflows 25 year after Marc Benioff tried to persuade people to do software in the browser.
ChatGPT happened a lot faster. It exploded into our consciousness in late 2022, and it’s taken all the oxygen in tech almost immediately. If you’re building a startup today that isn’t focused on generative AI, all your friends will point at you and laugh, but much more importantly, ChatGPT got to 100m users in just 2 months. By this spring, unprecedented numbers of people had both heard of it and used it.
As with every observation about the acceleration of tech adoption, a lot of this is ‘standing on the shoulders of giants’ - OpenAI didn’t have to wait for people to buy devices or for telcos to build DSL or 3G. For consumers, ChatGPT is just a website or an app, and (to begin with) it could ride on all of the infrastructure we’ve built over the last 25 years. So a huge number of people went off to try it last year.
The problem is that most of them haven’t been back. If you ask what ‘used’ actually means, it turns out that most people played with it once or twice, or go back only every couple of weeks.This is a very glass half-empty / glass half-full kind of chart, as the caption points out. On one hand, getting a quarter to a third of the developed world’s population to try a new product in 18 months is very hard. But on the other, most people who tried it didn’t see how it was useful.
Of course, there’s a selection bias here: if you’ve bought a $650 smartphone, you’ve already decided that it’s useful, and you’re a lot less likely to abandon it than a website that you spent 5 minutes playing with. And you could also point out that the best versions of the models are often behind paywalls.
But if this is the amazing magical thing that will change everything, why do most people say, in effect, ‘very clever, but not for me’ and wander off, with a shrug? And why hasn’t there been much growth in the active users (as opposed to the vaguely curious) in the last 9-12 months, as shown in a bunch of similar surveys ? The most revealing - possibly - is Google Trends, which must always be used with caution, but which seems to show a correlation with school holidays .
We can figure this $600B question only in hindsight.
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenDevin, a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with a command line, and browsing the web. We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, coordination between multiple agents, and incorporation of evaluation benchmarks. Based on our currently incorporated benchmarks, we perform an evaluation of agents over 15 challenging tasks, including software engineering (e.g., SWE-Bench) and web browsing (e.g., WebArena), among others. Released under the permissive MIT license, OpenDevin is a community project spanning academia and industry with more than 1.3K contributions from over 160 contributors and will improve going forward.
When this starts working well, we’ll be able to scale software engineering—something that has proven very hard, slow and expensive. Until then, we do have various GitHub Copilot style helpers.
领英推荐
postgres.new: In-browser Postgres with an AI interface
Introducing postgres.new , the in-browser Postgres sandbox with AI assistance. With postgres.new , you can instantly spin up an unlimited number of Postgres databases that run directly in your browser (and soon, deploy them to S3).
Each database is paired with a large language model (LLM) which opens the door to some interesting use cases:
All while staying completely local to your browser. It's a bit like having Postgres and ChatGPT combined into a single interface
A powerful combination of Wasm (to run real PostgreSQL right in your browser) and LLMs to analyze data and generate queries. Pure magic. It’s been a while since I saw a demo this good. Also see: PGLite .
xAI Announces Grok-2 Beta
We are excited to release an early preview of Grok-2, a significant step forward from our previous model Grok-1.5, featuring frontier capabilities in chat, coding, and reasoning. At the same time, we are introducing Grok-2 mini, a small but capable sibling of Grok-2. An early version of Grok-2 has been tested on the LMSYS leaderboard under the name "sus-column-r." At the time of this blog post, it is outperforming both Claude 3.5 Sonnet and GPT-4-Turbo.
Grok-2 and Grok-2 mini are currently in beta on ??, and we are also making both models available through our enterprise API later this month.
API coming soon
We are also releasing Grok-2 and Grok-2 mini to developers through our new enterprise API platform later this month. Our upcoming API is built on a new bespoke tech stack that allows multi-region inference deployments for low-latency access across the world.
OpenAI reportedly leads $60M round for webcam startup Opal
The Information reported the ChatGPT developer’s investment plans today. OpenAI is expected to be joined by several of Opal’s existing backers in the new round, which is described as a Series B raise. Founders Fund and Kindred Ventures are among the returning investors that could reportedly take part.
Opal’s flagship product is a $149 webcam called Tadpole (pictured) that’s designed to clip onto laptop monitors. It takes the form of a 1.25-inch square that weighs about as much as an AA battery. Tadpole can record videos with a resolution of up to 3840 pixels by 2160 pixels, sharpness that Opal says is usually available only with pricier webcams.
Pretty curious link. With all the hoopla surrounding how AI companies are getting their training data, it does give me a good reason to pause should I consider buying a webcam from a company backed by a leading AI providers.
If you've made it this far and follow my newsletter, please consider exploring the platform we're currently building: Unstract —a no-code LLM platform that automates unstructured data workflows.