登录查看更多内容

'Reasoning Tokens' and swarm-based agent applications

Doug Ware

Azure AI Services and M365 Development MVP | Elumenotion | System builder specializing in integration of AI and applications

发布日期: 2024年9月30日

Today is an exciting time to be a product builder and AI applications consultant. Two of the biggest obstacles to building useful systems with generative AI have fallen away in the last month or two: available capacity of state of the art models, and high costs.

As of today (Sept 30, 2024) the default quotas are documented on the Azure OpenAI Service quotas and limits page.

Massive capacity increases of available Tokens Per Minute (TPM) per deployment

Azure OpenAI Service imposes quotas and rate limits to manage resources effectively. These limits are defined in terms of Tokens Per Minute (TPM) and Requests Per Minute (RPM).

Default TPM Limits:GPT-4o: 450,000 TPM (Default), 30 million TPM (Enterprise agreement)GPT-4o-mini: 2 million TPM (Default), 50 million TPM (Enterprise agreement)GPT-4 Turbo: 450,000 TPM (Default), 2 million TPM (Enterprise agreement)

... and steep price declines

In terms of practical architecture and design, going from a quota of 80k per minute to 2000k changes things dramatically.

'Reasoning Tokens', swarms, and high TPM requirements

Recently OpenAI released o1-preview. The o1 models prioritize enhanced reasoning and extended thinking time for complex tasks.

OpenAI hasn't disclosed how the system works and are reportedly banning people who get too nosey. API billing for the model includes 'reasoning tokens'. Reasoning tokens are a key aspect of the o1 models, representing the model's internal thought process as it breaks down the prompt, considers various approaches, and formulates a response. These tokens are not visible in the API response but are still billed and counted as output tokens.

Thinking for yourself

While I don't know how OpenAI's chain of thought system works, I can guess that it is based on the same extension mechanism used for image generation, file search, code interpreter and plugins which, in the assistants API is called tools.

Tools allow a language model to know about and 'call' other programs. The language model has no ability to talk to other systems (+1 for humanity), the program (such as a chatbot) using the language model has to do the work of invoking the tool (somehow) with the input provided by the language model, get the results (however is appropriate) and continue the generation by including the tool call output in the next generation operation.

My work with AI ecosystems over the past couple of years has included a mix of python, javascript/typescript, and .NET. A big reason I used c# and .NET for AntRunner instead of one of the others is the strength of .NET for asynchronous operations and parallel tasks. That is to say simply: it's easy to do a lot of things at once, instead of one thing at a time.

Doing a lot of things at once is really important if you have a lot of work to do that can be done in parallel.

Over the coming days, weeks, and months I plan to write a lot about building agentic AI applications. I'll be using AntRunner which is an open source library and associated tools for building AI applications. AntRunner allows an ant to use tools which might be other ants. In those cases, a swarm of ants work together in parallel to create the answer.

领英推荐

CAG vs. RAG Explained: Choosing the Right Approach for…

B EYE | Data. Intelligence. Results. 1 个月前

Top Frameworks for Building AI Agents in 2025

saasguru 1 个月前

Introducing Fuji-Web

Normal Computing 9 个月前

This chart shows one such run which took just over 60 seconds. During that time a total of 41 ants search the web, loaded pages and extracted relevant content, but it also shows a massive spike in usage which the quota for the deployments easily permit.

One swarm of 41 ants working in parallel for ~60 seconds on a job: > 235 thousand tokens

The prompt

Use python to get today's date and then use it to find things to do in Atlanta for each of the next four weekends with pictures and correct links from FindAnswers.

Don't forget - include only valid links!

Sample output

You can view the output of a sample run here

Follow the link to see the report the ants built. They are very proud creatures and will know if you don't.

Cost breakdown

Using mostly gpt-4o-mini keeps costs down. The single call to gpt-4o cost almost as much as 40 calls to gpt-4o-mini!

For just 12 cents, you can feed a group of hardworking ants

This same ant swarm with only gpt-4o produces similar, slightly better results but costs around $1.20 instead of 12 cents. I can easily reduce it to 9 cents by replacing code interpreter with a local function.

--Doug Ware September 30, 2024

P.S. Need to build AI applications but could use a guide? I'd love to hear from you!

The original content is from my blog: 'Reasoning Tokens' and swarm-based agent applications (elumenotion.com)

Steve Mordue XMVP

Professional Bear Poker. "Former" 9-Time Microsoft MVP. Creator of RapidStart Apps. US citizen living on a mountain in Brazil

5 个月

I am so glad you are going deep down the AI rabbit hole so folks like me don't have to ??

查看更多评论

要查看或添加评论，请登录

Doug Ware的更多文章

Tool-based Agent Pattern

2025年1月6日

Tool-based Agent Pattern

I ended my last article, Retrieval Augmented Generation is an Anti-pattern by saying, ”Cheaper and better models with…

9 条评论
Retrieval Augmented Generation is an Anti-pattern

2024年12月29日

Retrieval Augmented Generation is an Anti-pattern

In 2025, don’t start your AI journey solving problems from 2022 It’s hard for me to believe that I started my AI/ML…

12 条评论
AntArmy and AntRunner - About Ants

2024年9月26日

AntArmy and AntRunner - About Ants

Two weeks ago, I started the process of getting AntArmy released to the first group of testers and now the uses table…
Code Interpreter Python Package Reference: July 4, 2024

2024年7月4日

Code Interpreter Python Package Reference: July 4, 2024

I am working on an article about creating visualizations with ChatGPT. A few of the techniques involve the use of Code…

1 条评论
Magic? GPT-4o versus Azure Document Intelligence and Azure Computer Vision OCR

2024年5月30日

Magic? GPT-4o versus Azure Document Intelligence and Azure Computer Vision OCR

The original article is on my blog here: GPT-4o versus Azure Document Intelligence and Azure Computer Vision OCR…

5 条评论
Comparing Copilot Studio and Azure AI Studio for No-code RAG Chatbots

2023年12月7日

Comparing Copilot Studio and Azure AI Studio for No-code RAG Chatbots

In my previous article I covered the important basics that I think you should understand to get the most out of this…

20 条评论
Chatbot Basics - Concepts, Hallucinations, and RAG

2023年12月4日

Chatbot Basics - Concepts, Hallucinations, and RAG

The following is from a section of an article I started on my blog last week currently titled Building Good Chatbots…
Coming to Grips with AI in Software Dev - The Weighted Randoms Sample

2023年4月23日

Coming to Grips with AI in Software Dev - The Weighted Randoms Sample

Today I wrote the intro article for the Weighted Randoms project. You can find it here: Weight Randoms (elumenotion.
My Current (random) Opinions on AI and Software Development

2023年4月21日

My Current (random) Opinions on AI and Software Development

This is a repost of from my blog. The original is here: Current Opinions (elumenotion.
Speech to Text and Chat-GPT for Writing Authentic Content Quickly

2023年4月18日

Speech to Text and Chat-GPT for Writing Authentic Content Quickly

This content was originally posted on my site here: Speech to Text and Chat-GPT for Writing Authentic Content Quickly…

2 条评论

See all articles

'Reasoning Tokens' and swarm-based agent applications

Doug Ware

Azure AI Services and M365 Development MVP | Elumenotion | System builder specializing in integration of AI and applications