My AI Rube Goldberg Machine
In yesterday’s post, I calculated the profitability of public software companies. To calculate these figures, I built a little Rube Goldberg machine.
I didn’t download the data into Excel. Instead, I complexified things by sending the analysis to 4 AIs to see if they would agree.
The inspiration : many companies have used Amazon’s Mechanical Turk to crowdsource tasks, & pick a consensus answer across three workers to improve accuracy.
Why not try this across 4 AI workers instead?
Prompt : “calculate the average net income margin and cash flow from ops margin from this data set” plus the data set. Note that CFOM isn’t a simple average but requires dividing cash flow from ops by revenue beforehand.
Gemini scored top marks for tabulating correctly on both columns. ChatGPT did well with NIM but “forgot” to complete the additional division step, which I corrected with a follow up, but still not the right figure. The other systems missed the mark altogether.
It would be a mistake to draw any broad conclusions from my little experiment.
But in this case, consensus doesn’t yet work as a strategy which means I still need to double check calculations myself.
At some point, AI will mechanize the illusory Mechanical Turk & I’ll restart my Rube Goldberg math machine with confidence.
Growth Consultant
3 周Very important as more and more people take AI as the truth for critical business, legal and more?
“Scaling startups: From Founder-Led Sales to Repeatable Growth”
4 周This is great and I'm stealing the word "complexified".
Builder of AI, Cloud & Smart Contract Factories
4 周Given the trend towards foundational LLM convergence over the past few respectively releases, it refreshing to see the divergence in these outputs ;) Having said that, this looks like an opportunity for more elaborate prompting via Agentic approaches with Claude and Gemini, or the OpenAI inference scaling and Perplexity Advanced Research modes.
Building a joyful, AI-driven CRM
4 周Tomasz Tunguz I recently tried this for Hubspot's valuation. We fed it 10 years of 10Ks and 10Qs and asked it to compose a variety of valuation techniques. It got some of the basic financial stuff correct - esp those related to a balance sheet where ... things balance! But when it came to making assumptions about growth, it failed miserably. For example, it would quote the growth platform from 2018, and then when probed about sources would cite it to 2022. There's still a lot of hallucinating and poor citation cognition at play. It worked really well though when we were super specific about the valuation technique, the specific formula, and we trimmed the data down to just 10Ks vs quarterlies. I think soon though financial operators will be able to quickly assess years worth of public company records and have a pretty clear picture of how the company operates, and their publicly stated strategy. This type of analysis was once reserved for saavy financial folks, or wall street. Soon it becomes a mainstreet tool in founders arsenal.
Have you tried to apply AI workers on other problems too? Maybe as an inspiration: Recently, I asked 3 foundational models to each have group discussions of 3 agents and come to an overall conclusion afterwards. Then you can ask one model to synthesize one answer from the three models.