ACAD 32: Devin Dares- Ripe or Hype?
Devin by Cognition AI

ACAD 32: Devin Dares- Ripe or Hype?

Social media is flooded with the launch of Devin, the first autonomous AI software engineer launched by Cognition AI Inc. This super-smart computer program is a clever assistant for software engineering tasks such as writing code, debugging errors, and deploying applications in real time. Emerging startups in the “code assistant” arena are obsessed with turning everyone into a developer/programmer.

Cognition is a young startup incorporated in the Bay Area earlier in 2024. The founder trio and a band of 10 sport coders are shuttling across AirBnBs in Silicon Valley to teach Devin the ropes.

Cognition team

What can Devin do?

Devin has accomplished real jobs on platforms such as Upwork involving fixing issues and making reports. Statistically speaking, Devin has been able to solve ~14% of the diverse code issues fielded to this algorithm as part of a Real World Software Engineering test. Essentially, this autonomous coder is performing 3X better than the next large language model Claude which stands at 4.8% (#2). Is 14% good enough? Is the test truly representative of real-world scenarios?

At its heart, Devin is an LLM autonomous agent capable of drafting a detailed execution plan to achieve your stated software engineering goals. The agent can then execute the planned activities independently and iteratively such as browsing the internet and reviewing API documentation to get access to the right data, scanning GitHub repositories to get access to a jumpstart code, applying several debugging techniques leveraging the information on Stack Overflow and so on.

Sample activities where Devin has demonstrated proficiency

  • Develop and refine existing open-source AI models (such as Llama from Meta).
  • Build and launch a website by building front-end & back-end autonomously
  • Build automated unit tests and integration tests for stated business scenarios
  • Detect security vulnerabilities within your code

Want to Access Devin?

Link to Google Form for getting access to Devin

You can fill out the above form link to request access to Devin. The access is constrained to a handful of developers with an active, real use case.


Concept of the day- LLM Agents

The models in the LLM realm are moving from “predicting the next word” to “advancing the reasoning” paradigm. Teaching AI to be a programmer is a deep algorithmic problem that requires making complex decisions while looking at a few steps into the future to decide what route to pick next- quite like the game of chess! (#3). See a high-level process for setting up this agent below. I have covered this process for building a specific data mining LLM agent in one of my previous ACAD blogs.

  1. Preparation: Define scope (content generation, customer service), define autonomy level, and feed relevant data to tune the algorithm.
  2. Configure Decisioning Engine: Train the LLM to have conversations with itself and the user, such as asking follow-up clarifying questions or narrowing down to a final solution among a myriad of choices.
  3. Deployment: Deploy the app in a controlled environment with pilot users, monitoring user interactions and technical performance to form an active feedback loop and improvise. The agent should be configured to learn and fine-tune from the data collected during every interaction.

Example 1: Personalized Content Recommendation

Scenario: A recommendation system for a news app that uses an LLM to analyze user interests and reading habits to suggest relevant articles.

What does the agent do? The LLM processes user interactions and feedback on various articles to understand preferences. It then decides which new articles might be of interest to the user by:

  • Identifying topics, authors, or genres that the user prefers.
  • Analyzing the sentiment or engagement level of past interactions.
  • Recommending articles that match the user's profile, potentially adjusting the recommendations based on the user's feedback to improve over time.

Example 2: Customer Support Chatbot

Scenario: A chatbot designed to handle customer service inquiries autonomously, using an LLM to understand and respond to customer requests.

What does the agent do? When a customer asks about tracking their order, the chatbot uses the LLM to comprehend the request and then consults the company's order tracking system to retrieve the specific order status. The decision-making process involves:

  • Interpreting the customer's inquiry to identify it as a tracking request.
  • Extracting order details (e.g., order number) from the conversation.
  • Querying the tracking system with these details.
  • Communicating the retrieved information back to the customer clearly


What’s the buzz about?

As per Global Count, Software engineering boasts 26.3 million jobs globally, recording an average of 3.7% YoY growth over the last 5 years. India leads the pack with a 17% YoY growth followed by North America (#1). The “fear”? Are we reaching a "plateau of growth" or even worse “will the software engineering industry turn down on its head?”

Devin might seem like an enthusiastic teammate but let's not dial down the limitations of benchmark tests such as the Real World Software Engineering test. We've had tools like Google's Alpha Code for a while now, offering various code solutions in multiple languages.

Most tech companies today embrace the agile ways of working (over the traditional waterfall approach), implying that software requirements are not set in stone on day 1. By design, developing a meaningful product offering today requires a cross-functional team involving representation from product management, software development, analytics, and marketing teams to make sense of the customer requirements in real time. Such an environment often requires software engineers to devise hacky ways of accomplishing objectives given their prior experience of what drives success in a domain.

As Francois Chollet puts this across brilliantly- if software engineering is fully automated, software engineers can move on to “high leverage” positions. In the end, software engineering is about “developing mental models of problems and their solutions”. Sam Altman (Founder of Open AI and touted as the Oppenheimer of AI) recently put it “For me, AGI is the equivalent of a median human that you could hire as a co-worker” (#4)

In my view, the future software engineer will operate on a much bigger screen, with perhaps a decent mix of autonomous AI engineers and junior software engineers at their disposal. This dream team would help amplify the impact of the lead engineer and reduce the drudgery of mundane tasks, freeing up time for high-level thinking. AI is here to not take away jobs but to eventually make the runway smaller for driving impact. One can only imagine how many more skilled software engineers would we need to graduate into a world where half of the GDP is driven by digital enterprises(#5)


Resources:

  1. https://www.dhirubhai.net/pulse/how-many-software-developers-world-codeninjainc/
  2. https://www.msn.com/en-in/money/news/meet-devin-worlds-first-ai-software-engineer-what-it-does-and-how-it-works/ar-BB1jSFxT
  3. https://www.bloomberg.com/news/articles/2024-03-12/cognition-ai-is-a-peter-thiel-backed-coding-assistant?accessToken=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzb3VyY2UiOiJTdWJzY3JpYmVyR2lmdGVkQXJ0aWNsZSIsImlhdCI6MTcxMDI0ODc3NCwiZXhwIjoxNzEwODUzNTc0LCJhcnRpY2xlSWQiOiJTQThLNFFUMEcxS1cwMCIsImJjb25uZWN0SWQiOiI5MTM4NzMzNDcyQkY0QjlGQTg0OTI3QTVBRjY1QzBCRiJ9.DZvx9NvMMQF0p-rA6xO3KKH0DxcVdAOWKaHXtW-3R6c&sref=nPlhheXZ
  4. https://analyticsindiamag.com/the-idea-was-never-to-replace-humans/
  5. https://startupgenome.com/article/the-state-of-the-global-startup-economy

Carlo Beltran

Integrating emerging technologies with jurassic methodologies.

8 个月

Have they made any impact declaration on the human-to-AI engineer team ratio using Devin? Say for example the speed of 10 human software engineers versus 1 HSE and 1 Autonomous SE?

Sanyam Singh Sengar

Healthcare Enthusiast | Analyst | SkillCoach

8 个月
回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了