Is Devin on the path to Coding AGI?

Is Devin on the path to Coding AGI?

Members of the Cognition AI team in New York. Credit: Levi Mandel for Bloomberg Businessweek


Hey Everyone,

The real AI bros just keep getting younger this year as funding appears limitless for AI startups of this variety recently. To see this blog with proper formatting and images go here.

In 2024 we are seeing more AI startups and coding assistants/agents coming to market with demos suggesting they can do a lot of things. I first wrote about Devin here.

Is Devin just another agent-GPT or something more sophisticated? Many are saying it uses GPT-4, which seems likely to me.

Built by Cognition Labs, who were first backed by Peter Thiel is claiming a viable coding assistant called Devin.

Generative AI has meant that we are seeing an increasing number of Chinese Americans and American educated Chinese found pretty fascinating startups. A lot of the builders and software engineers most interested in machine learning happen to be Asian. Open a random AI paper in 2024, and you will see what I mean.

Cognition Labs of course is making incredible claims like: “Devin can today in March, 2024 make thousands of decisions, recall relevant context, learn over time, and correct mistakes in code.”

The tool, called Devin, has rattled software engineers across the tech sector. and many are skeptical it’s much more than a GPT-4 skin.


Read the Blog



To start using Devin for engineering work, please reach out here or get in touch at [email protected].


What can Devin Do?

  1. Devin can build and deploy apps end to end.
  2. Devin can learn how to use unfamiliar technologies.
  3. Devin can autonomously find and fix bugs in codebases.
  4. Devin can train and fine tune its own AI models.
  5. Devin can address bugs and feature requests in open source repositories.
  6. Devin can contribute to mature production repositories.
  7. We even tried giving Devin real jobs on Upwork and it could do those too!

See the video demos.

"Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser," said the company on X.

Follow them on X. (all the following have demos on their blog announcement)

  • Cognition Labs also is trying to say Devin in a stark improvement in coding reasoning:
  • After reading a blog post, Devin runs ControlNet on Modal to produce images with concealed messages for Sara.
  • Devin makes an interactive website which simulates the Game of Life! It incrementally adds features requested by the user and then deploys the app to Netlify.
  • Devin helps Andrew maintain and debug his open source competitive programming book.
  • Given just a link to a GitHub issue, Devin does all the setup and context gathering that is needed.

And there are others.

The startup Cognitive Labs, is claimed to be about 2 months old which would make it founded in January, 2024.

Devin claims to do things GitHub Copilot hasn’t managed t build in many years. GitHub Copilot went live in October, 2021.

Devin has $21 million in funding so far, though appears rather secretive.

Cognition AI’s founders (left) Hao, Scott Wu, Walden Yan (right)

The company claims Devin sets a new state-of-the-art on the SWE-Bench coding benchmark, successfully passing practical engineering interviews and completing real jobs on Upwork.

In the Bloomberg article: CEO Scott Wu explains, "Teaching AI to be a programmer is actually a very deep algorithmic problem that requires the system to make complex decisions and look a few steps into the future to decide what route it should pick. It's almost like this game that we've all been playing in our minds for years, and now there's this chance to code it into an AI system."

It’s way too early to know what to make of Devin, but their demo has gotten a lot of positive people on X saying great things about them.

Besides Peter Thiel and Elad Gil (who is in on nearly everything AI), it’s not clear to me how Devin got funding so far. The product is said to be slow and breaks often.

How different will Devin from any variation or packaged Auto-GPT, Ollama, AgentGPT, GPT Engineer or Super AGI? I couldn’t tell you.

Other angels include:

Patrick and John Collison, Elad Gil, Sarah Guo, Chris Re, Eric Glyman, Karim Atiyeh, Erik Bernhardsson, Tony Xu, Fred Ehrsam

  • Peter Thiel
  • Elad Gil
  • Patrick and John Collison (Stripe)
  • Sarah Guo
  • Chris Re
  • Eric Glyman
  • Karim Atiyeh
  • Erik Bernhardsson
  • Tony Xu
  • Fred Ehrsam

After just a few months into 2024? Something very weird is going on.

Is the demo really as good as advertised? When it was evaluated on a benchmark asking AI to resolve issues found in real-world open-source projects on GitHub, Devin managed to fix 13.86% unassisted. That may seem low, but it's a huge leap from the 1.96% of issues a previous top model could correct.

The founders of Cognition AI are Scott Wu, its chief executive officer; Steven Hao (ex Scale AI), the chief technology officer; and Walden Yan, the chief product officer.

It’s hard to know what’s going on here or how difficult it would be to clone (by the Chinese themselves for instance). While the technical details remain undisclosed, Wu hints at a unique combination of large language models and reinforcement learning techniques that enable Devin's advanced reasoning capabilities.


So what are we to Believe?

Independent testers, including prominent VCs and CEOs have lauded Devin's ability to maintain coherence and stay on task through hundreds or thousands of steps, a notable improvement over existing AI coding assistants.

Patrick Collison of Stripe says these aren’t just cherry-picked demos.

Gold medal Coders? Wu, 27, is the brother of Neal Wu, who also works at Cognition AI. These two men are world-renowned for their coding prowess: The Wu brothers have been competing in, and often winning, international coding competitions since they were teenagers, and they have helped elevate the US national coding team to a more respectable position against its Chinese and Eastern European rivals in recent years.

CEO of Perplexity had this to say:

  • Arabind says Devin “crosses the threshold of what is human level and works reliably.
  • There are other AI startups looking for “Coding AGI”, like Magic AI (sometimes called Magic.dev).

Devin’s Abilities for Software Engineers

So are programmers supposed to buy Devin to make them more productive? Cognitive Labs says Devin can help with:

  1. Autonomous problem-solving
  2. Analogical reasoning
  3. Coding capabilities
  4. Human-AI collaboration

With the ability to scale to become more useful soon.

I don’t know anything about Sports coding, never heard of it to be honest. Cognition AI is full of sport-coders. Its staff has won a total of 10 gold medals at the top international competition, and Scott Wu says this background gives his startup an edge in the AI wars.

Cognition (Cognitive Labs) thinks that "solving reasoning" can "unlock new possibilities in a wide range of disciplines” is now possible with the evolution of Devin.

Team Wu want wants Devin to be seen as a "tireless, skilled teammate" capable of building alongside humans, or independently if left to do that. I’m okay if that turns out to be true, but I’m naturally pretty skeptical.

To start using Devin for engineering work, please reach out here or get in touch at [email protected].

Misc.

Cognition’s founding team has 10 IOI gold medals and includes leaders and builders who have worked at the cutting edge of applied AI at companies like Cursor, Scale AI, Lunchclub, Modal, Google DeepMind, Waymo, and Nuro.

  • Lunchclub
  • Modal
  • Google DeepMind (GDM)
  • Waymo
  • Nuro
  • Cursor
  • Scale AI

Packy McCormick of Not Boring said:

To see this post with images go here.

Fred Ehrsam

I guess this is a pretty decent quote:

“For the first time I have seen AI take a complex task, break it down into steps, complete it, and show a human every step along the way - to a point where it can fully take a task off a human’s plate.”

Linas Beliūnas

Linus featured Cognition in a viral piece (probably sponsored) here on LinkedIn.

Kyle Shevlin

Per Business Insider, Kyle Shevlin, founder and software engineer at software development agency Athagist, expressed frustration on X about the industry "trying to aggressively replace one of the few remaining jobs that provides a legit middle-class income."

While young people have little reason to be critical of the AI bro cultures of AI tools spam, some older engineers and workers see this as not so healthy a development.

As more Magic AI and Devins come to market, I wonder what exactly we will see. A new era of AI-human hybrid jobs and agency or us literally teaching AI how to replace us?

The price of increased productivity might mean shorter work weeks for those lucky to still have great jobs. There will definitely be winners and losers.

Kaival P.

Building NeoApps.AI for 90x Faster Software Delivery and 60% Cost Reduction for Internal Tools

8 个月

Hey Devin, I’ve got this impressive collection of repos for frontend, backend, workernodes, Kubernetes, you name it! And guess what? I want to dive into all of them simultaneously, making changes in different languages, while also executing test cases. Oh, and by the way, it’s basically what you’re doing, just, you know, generating the app. Think you’re up for the challenge? Oh, and if you’re keen on brushing up your skills the human way, check out my YouTube channel. Feel free to drop by and teach yourself a thing or two. Also, for some extra learning, swing by https://neoapps.ai and its docs at https://docs.neoapps.ai. Dont forget to subscribe YT videos so you can learn better https://youtube.com/@NeoAppsAI?si=3B0J8xp3aKVmlrLf Happy coding!. Lol.

回复

要查看或添加评论,请登录

Michael Spencer的更多文章

  • Guide to NotebookLM

    Guide to NotebookLM

    Google's AI tools are starting to get interesting. What is Google Learn about? Google's new AI tool, Learn About, is…

    3 条评论
  • The Genius of China's Open-Source Models

    The Genius of China's Open-Source Models

    Why would an obscure Open-weight LLM out of China be worth watching? Just wait to see what happens in 2025. ?? In…

    9 条评论
  • First Citizen of the AI State: Elon Musk

    First Citizen of the AI State: Elon Musk

    Thank to our Sponsor of today's article. ?? In partnership with Encord ?? Manage, curate and annotate multimodal AI…

    14 条评论
  • The Future of Search Upended - ChatGPT Search

    The Future of Search Upended - ChatGPT Search

    Hey Everyone, I’ve been waiting for this moment for many many months. Upgrade to Premium (?—??For a limited time get a…

    8 条评论
  • Can India become a Leader in AI?

    Can India become a Leader in AI?

    Hey Everyone, As some of you may know, readers of Newsletters continue to have more and more readers from South Asia…

    8 条评论
  • NotebookLM gets a Meta Llama Clone

    NotebookLM gets a Meta Llama Clone

    “When everyone digs for gold, sell shovels”. - Jensen Huang Apple Intelligence is late and other phone makers are…

    7 条评论
  • Top Semiconductor Infographics and Newsletters

    Top Semiconductor Infographics and Newsletters

    TSMC is expanding globally and driving new levels of efficiency. Image from the LinkedIn post here by Claus Aasholm.

    2 条评论
  • Anthropic Unveils Computer Use but where will it lead?

    Anthropic Unveils Computer Use but where will it lead?

    Hey Everyone, This could be an important announcement, whereas the last two years (2022-2024) LLMs have showed us an…

    10 条评论
  • Why Tesla is not an AI Company

    Why Tesla is not an AI Company

    Hello Everyone, We have enough data now to surmise that Tesla won't be a robotaxi or robot winner. Elon Musk has helped…

    11 条评论
  • The State of Robotics 2024

    The State of Robotics 2024

    This is a guest post by Diana Wolf Torres - please subscribe to her Deep Learning Daily Newsletter on LinkedIn if you…

    4 条评论

社区洞察

其他会员也浏览了