My View On Agents

My View On Agents

January 27, 2025

First through the Stargate

This last week the new administration and some tech leaders combined with Japanese cash announced “Project Stargate”. (https://x.com/OpenAI/status/1881830103858172059)

First, unless this is a superpositional faster than light transport portal, I think they should rename it. Second, is this more hype than substance? Maybe. $500B is an awful lot of money and that’s cash that Softbank doesn’t have liquid. So it’s more of a commitment to best effort over time rather than a guarantee right now. The people involved in this announcement combined with the political winds of the moment also tell part of the story about the game of chess being played. Third, is this really a requirement for advancing the current approach to AI? Given that Deepseek has done nearly as well (trust those benchmarks right?) with ostensibly a fraction of the resources of OpenAI, is there really a need for a $.5T infrastructure investment to get to where we need to go? Who knows but Altman asked for like 7 Trillion so maybe .5 is the best he can get.

And lastly, this AI investment is predicated on a bunch of assumptions. I did a breakdown of the costs of AI data centers last year, but if we follow that same logic the amount of power required to run the whole shebang quite literally does not exist, unless you are willing to put a few major US states into permanent blackout. https://x.com/energybants/status/1881860142108377412 So power, water, not to mention the ancillary hardware required is probably years away from being available. I have to say this is likely mostly posturing. And if any of Project Stargate is built it will likely be a small fraction of the pie-in-the-sky vision. AI will advance but it won’t be through the “Stargate”.

A bit about Deepseek R1

It’s bigger in the MLverse than even OAI’s o3 release. Whether you believe it is the product of cracked Chinese quant traders optimizing a relatively straightforward concept (more on that later) OR it’s an effort by the Chinese government to stuff stick in the spokes of US AI companies using illicitly gathered NVIDIA GPUs, there is no doubt that it’s a capable model and it’s fully open. As I stated in the 2025 predictions:

It looks like this prognostication is headed in the right direction, even if the gov alignment isn’t yet apparent. Lots of companies are planning to fine tune this model for on-prem and even on-device AI systems. As a competitive impetus in the space, I like it.

A bit about how it achieved such good performance on comparatively inferior (hypothetically) hardware: Basically pure reinforcement learning. Base Model -> RL ->Finetune -> RL -> Finetune → RL and so on. The paper here: https://arxiv.org/abs/2501.12948 Pure reinforcement learning (RL) enables an LLM to automatically learn to think and reflect. It is a bit of a simple ripost to the OAI’s o1 family of “reasoning” models that require extensive chain of thought data. Simply incentivizing the model like they did way back in the AlphaGO days gets you similar performance.

It also highlights another interesting thought experiment. One diagram in the R1 paper shows a pretty linear relationship between inference and the response length.

This seems to be better than o1 based models that are now generating tens of thousands of tokens to solve hard problems. o3 is likely generating hundreds of thousands or millions of tokens that add up to thousands of dollars to solve complex problems like the ones on the ARC-AGI benchmark. Is R1 The optimal solution attainable? Don’t know, but this may well be a very important area of study. We know how to vertically fractionate models, basically big-model-to-small-model, very well. But getting the chain of thought process to be done more efficiently seems a cool place to explore.

This paper talks about this theory: https://arxiv.org/abs/2310.07923 Where you ablate tokens from CoT while training it to mimic its fully-leaded base. Would definitely lower costs and latency.

Warp Speed

Sure, you have your zsh theme just dialed in right? But here comes a cool offering to replace that outdated terminal that looks like a unicorn threw up neon colored bash commands all over your screen.

https://www.warp.dev/

It’s an AI enabled terminal that has a freemium model to try out. The one thing that I’ve found it helps with is the python dependency hell that I constantly find myself in. It simply suggests the fixes and I press enter. Lovely.

My view on Agents

These days, it’s hard to miss the hype around AI Agents. They’re everywhere: booking flights, managing calendars, even fixing code. With any new technology comes some confusion around the terminology. But beyond defining what an Agent is, it’s also crucial to understand the technical nuances.

Let’s start with the definition. I define agents narrowly: they should have well-defined inputs and produce consistent, reasonable outputs. Think of it like functional programming, but with the added power of a Large Language Model (LLM) as the underlying code. While the agent’s “code” isn’t written line by line, its capabilities and processes are clear. I view agents as atomic operators that can be chained together into well-defined programming workflows.

Understanding the technical side aspect to agent design is also important. When developers use AI for tasks like generating unit tests, interacting with a CLI, or any number of other capabilities, they expect predictability and consistency. Without this, the AI tool risks trapping the developer in a frustrating cycle of errors and prompt loops. Often, this ends up wasting more time than doing things the “old-fashioned way.” In short, the developer loses trust. This issue tends to be even more pronounced with experienced developers.

So agents should be atomic and tightly defined. Provide access to semantically consistent data and you now have an auditable and consistent “agentic network”

An Interesting Observation

A surprising outcome of the development of low-code tools is the revelation that few individuals outside of engineering teams possess the capability to automate or connect business flows. This remains true even when the requirement to learn coding is removed. The previous assumption that numerous individuals could accomplish simple automation tasks if they did not need to learn Python and the subsequent design of products to facilitate this have been proven incorrect.

I consistently run into people in the tech space that don’t seem to grok how to use AI in the most basic of helpful ways. Perhaps the observation that certain people possess the ability to think “like” the technology they are using will separate the people skilled with AI from those that will never be able to use it as effectively. I have a good friend and reader of the newsletter that can literally speak SQL. I’ve never seen anyone as talented. But the average person on the street? They’ll simply never get the concept of an inner join (or be able to use it) no matter how much AI you give them.

要查看或添加评论,请登录

Marshall Jung的更多文章

  • Are We Learning?

    Are We Learning?

    February 24, 2025 Are we learning? https://nmn.gl/blog/ai-and-learning I’m following up on our discussion about AI in…

  • The Future of Education

    The Future of Education

    February 17, 2025 The Future of Education This last week I was at my kids school’s junior high orientation for my…

  • Can You Feel the Vibe?

    Can You Feel the Vibe?

    February 10, 2025 Feeling the Vibe A bit of a shift in the world of ML/AI this week. Yes, there were new models from…

  • A Tale as Old as Time

    A Tale as Old as Time

    February 03, 2025 A Tale as Old as Time “Talk is cheap”. “Actions speak louder than words”.

  • Hype or Singularity?

    Hype or Singularity?

    January 20, 2025 Welcome to my (our) bubble Given that Google search analytics are, for now, reasonably good proxies…

  • Why I live Dangerously

    Why I live Dangerously

    January 13, 2025 Why your frustration around LLMs can be explained by Sapir-Whorf Recently I was working on a few items…

  • Do I Use AI?

    Do I Use AI?

    January 06, 2025 My dear readers will need to forgive me this week. I will include a few useful tools and interesting…

  • The Gift of Wisdom

    The Gift of Wisdom

    December 30, 2024 The last newsletter of the year. Got all 52 weeks, and combined with the previous efforts from 2023…

  • I Missed Something

    I Missed Something

    December 23, 2024 I missed something Last week I reviewed my 2024 predictions and put my thoughts and beliefs on paper…

  • Year in Review and a View to 2025

    Year in Review and a View to 2025

    Yay! It’s time! Time for the end of the year predictions and review of the prognostications for the last 12 months…

社区洞察

其他会员也浏览了