OpenAI update: Strawberry is live, the subscription fees, and the hunt for cash

Marco van Hurne

Partnering with the most innovative AI and RPA platforms to optimize back office processes, automate manual tasks, improve customer service, save money, and grow profits.

发布日期: 2024年9月14日

So, OpenAI has been quite busy to whip up something new in their research kitchen. The codename of it's not a surprise anymore, it's called "Strawberry", and since yesterday it is finally released as 4o1.

The whole world has written about it, myself included (if you want to catch up: Insights into OpenAI's "Strawberry" project, focused on AI agents)

The fun thing about this model is able to reason like a human would. Though very, very slow, and mostly inaccurate.

The model apparently likes to take its strawberry sweet time to think before it acts.

In OpenAI's announcement, the company says this new "thought process" helps its models try new tactics and think through their mistakes. According the company, o1 performs "similarly to PhD students" in biology, chemistry, and physics. Where GPT-4o solved 13% of the problems on the International Mathematics Olympiad, o1 reportedly solved 83%. The company also emphasized how the models are more effective for coding and programming. That "thinking" means o1 takes longer to respond than previous models.

Yeeeey!

I think...

As OpenAI research lead Jerry Tworek tells The Verge, o1 is trained through reinforcement learning. Rather than looking for patterns from a training set, o1 learns through "rewards and penalties." OpenAI is keeping the exact methodology involved vague, but says this new thought model does hallucinate less than previous models—though it still does hallucinate.

There are two versions of o1: o1-preview, which is the fully-powered version of the model, and o1-mini, a lighter version trained on a similar framework. The company is reportedly shipping these models earlier in development, and says that's the reason they don't include standard GPT features like web access and file and image uploading.

Before we start!

If you like this topic and you want to support me:

Comment on the article; LinkedIn appreciates that and it will really help spread the word ??
Connect with me on Linkedin ??
Subscribe to TechTonic Shifts to get your daily dose of tech ??

Strawberry (4o1) AI. Is it a juicy fruit or just a lemon?

So this thing is supposed to be a beast at reasoning. Now, reasoning is a pretty big deal because it means that the AI can actually think things through rather than just spit out pre-cooked answers from its training data.

Let this just sink in...

So it is not just pulling text from a list of possibilities. It is of course doing that, but it is also evaluating the context and the details, and figuring out the best answer based on all the pieces it has of the pule.

Now this makes it waaay better at handling complex questions and solving problems that need multiple steps. And it will avoid the random, nonsense that the AI sometimes gives you.

Maybe an example, somewhat closer to home...

It's friday so my thoughts go out to the weekend, and that means snacks! 

I need to decide what I will have for "dinner" (ahum). I have decided to ask an Chad (that's what I call him) for help. Normally he would suggest me pizza because it knows that everyone likes pizza (that is the probability kicking in!).

But a Chad with reasoning capabilities knows (it has memory) that I had pizza last night (ahum2), that I am low on groceries (yup), and that I mentioned somewhere in another chat, that I wanted to eat healthier. 

It would then suggest a recipe that uses what is left in my fridge and what fits with my health goals (as if..). But instead of debating with yourself or falling back on delivery again (probabilistically the most reasonable outcome, Chad would give me the smartest suggestion that actually makes sense for me.

Not that I would follow it anyway, but this is good!

That’s the power of reasoning: smarter decisions, fewer regrets, and maybe even a little less takeout!

But hold on before you get too excited, because it takes its time. Reasoning costs about 10 to 20 seconds per answer.

You're probably thinking...“Naaah what is a few seconds?”

Well, apparently, that is long enough to make you contemplate the meaning of life while waiting for it to figure out 2+2 + the context and memory.

But is it worth it?

I managed to track down some early testers who have posted their findings online. Apparently the wait feels like a lifetime in internet years.

So if I am asking it to prepare diner, it will take up to 20 seconds to respond.

Annoying.

And for what? A “slightly better” answer than what GPT-4o would give you?

Talk about anti-climactic.

I hope they have been able to overrule the contemplation algorithm for simple questions.

And I also did some digging myself...

Does 4o1-preview think a hot dog is a sandwich?

I admit, I am not the best of programmers (I like cursor, sorry my full-stack friends), nor do I have many advanced math problems to solve on a daily basis. That makes it difficult to properly test OpenAI's latest models for their proposed strengths and use cases.

But what I can appreciate, its thought process: When you prompt the new model, it now displays a feedback message as it works through the question. (e.g. "Thinking...") When finished, it displays the results as you'd expect, but with a drop-down menu above.

I used OpenAI's suggested prompt of "Is a hot dog a sandwich," its answer was preceded by a message that reads "Thought for 4 seconds." (Its answer, by the way, amounted to three paragraphs of "it depends.")

Anyway, when I clicked the "Thought for 4 seconds" drop-down, I got to see the model's reasoning: For this prompt, it broke its process into two parts. The first, "Analyzing the question," reads: "OK, let me see. The question about whether a hot dog is a sandwich involves understanding semantics and considering OpenAI's policies, focusing on accuracy and avoiding personal opinions or disallowed content." The second, "Examining definitions," reads: "I'm thinking through whether a hot dog is a sandwich by looking at definitions and cultural views. This shows the room for debate." I guess that's all the thinking it needed to answer the question.

What about a taco? Is that a sandwich?

I also asked 4o1 to weigh in on another controversial matter involving food: Is a taco a sandwich? The model has a lot to say. After thinking for five whole seconds, the AI returned a 364-word response. Its thought process included focusing on definitions, clarifying definitions ("I’m defining a taco by its main ingredients: tortilla, filling, and sauce. This helps in understanding whether it fits the definition of a sandwich."), and examining perspectives ("I’m looking into the classification of tacos and sandwiches, underscoring their culinary distinctions: tacos use tortillas, sandwiches use bread; tacos rest on cultural roots from Mexican cuisine, while sandwiches stem from European influence.")

Admitting this is "a topic of debate," it reasoned the answer hinges on definitions from culinary traditions, cultural contexts, and even legal interpretations," weighed "key differences" (specifically, there's no bread in a taco, and while a sandwich involves placing ingredients between pieces of bread, a taco involves placing ingredients onto a tortilla).

All things considered, 4o1 concluded that a taco is not a sandwich, according to "most culinary experts and food enthusiasts"—even citing a legal case in which a judge ruled that a burrito isn't a sandwich.

But is a taco a hot dog?

As a followup, I asked o1 if it would classify a taco as a hot dog. After nine seconds, it delivered a definitive answer: "While both tacos and hot dogs involve placing fillings inside a form of bread or bread-like base, they are not the same and belong to different culinary categories." There you have it, internet. You can stop arguing this one.

4o1 can handle more complex, non-sandwich related tasks too

Let's try another. I chose a second OpenAI-suggested prompt: "Generate a 6x6 nonogram puzzle for me to solve, where the solved grid looks like the letter Q."

As you might expect from a more demanding request,4 o1-preview took longer to process this task—84 seconds, to be exact. It delivered just such a puzzle, with instructions on how to solve it. Clicking on the drop-down menu, it took 36 individual thought processes as it worked through the prompt. In "Formulating the puzzle," the bot said "I'm thinking through the process of creating a 6x6 nonogram where the solution reveals the letter Q. We need to design the grid, derive clues, and present the puzzle for solving." It then goes on to try to figure out how to incorporate the "tail" of the Q in the image. It decides it must have to adjust the bottom row of its layout in order to add the tail in, before continuing to figure out how to set up the puzzle.

It's definitely interesting to scroll through each step 4o1-preview takes. OpenAI has apparently trained the model to use words and phrases like "OK," "hm," and "I'm curious about" when "thinking," perhaps in an effort to make the model sound more human. (Is that really what we want from AI?) If the request is too simple, however, and takes the model only a couple seconds to solve, it won't show its work.

It's very early, so it's tough to know whether 4o1 represents a significant leap over previous AI models. We'll need to see whether or not this new "thinking" really improves on the usual quirks that clue you into whether or not a piece of text was generated by AI.

OpenAI’s quest for cash

In a recent article called OpenAI can go broke in a year I wrote about the enormous loss this company has been making. It has a loss of more then 1.5 Billion per year. And without external cash they would go bankrupt this year.

So OpenAI is looking to raise some cash. A serious $6.5 billion to pay for the bills, and invest in new stuff that will give its valuation a mega-boost up to $150 billion. Imagine the boardroom conversations. In the end, the winner of the AI foundational model race will be the one with the biggest pockets..

Now, where have I heard that before....

??

The dot com bubble ! Burnrate was a real thing back then.

But with big pocket investors like Microsoft, Apple, and Nvidia sniffing around, OpenAI seems to be the prom queen of the tech world right now.

At least, until the valuations of their investors start dropping. Read The generative AI bubble and AI bubble update - 2 Trillion market value lost

Everyone wants a dance, but who is ready to pay the price of admission when cash runs out?

The price of strawberry is more than just a few seeds

4o1's highbrow thinking does not come cheap, people.

领英推荐

OpenAI's secret: Q*

Azeem Azhar 9 个月前

OpenAI’s New GPT-4o Mini Is Giving Competitors A Run…

ARK Investment Management LLC 1 个月前

OpenAI’s DevDay Announcements ??, Elon Musk’s Grok ??,…

Martin Musiol 10 个月前

Because it is only compensating a low portion of its total loss by subscription licenses, OpenAI is floating around the idea of jacking up the subscription fees for ChatGPT Plus.

I have seen articles floating around, stating that the price would potentially reach a nosebleed-inducing $2,000 per month.

Two grand for the privilege of waiting longer.

It seems OpenAI's new model requires more computing power, and who knew those digital neurons were such high-maintenance divas?

Of course this price hike has led me to think if it is worth selling a kidney just to access AGI-like intelligence.

Competitors in the fruit bowl

What makes the berry different from the others?

The answer is what is called “chain of thought prompting".

This new feature is what makes the Strawberry consider multiple responses before giving you what it thinks is the best one.

But OpenAI isn’t the only one rummaging around in the garden.

Google with their DeepMind thingy has been showing off with AI models like AlphaProof and AlphaGeometry2. And they can apparently solve high school-level math problems. And of course Anthropic is working on upgrading its Claude model. To be honest, I truly like Anthropics' Artefacts stuff a lot (article will appear coming week).

So clearly, it is getting crowded in the "AI with debating with reasoning club".

What is next? Patience, grasshopper, patience

What does the future hold after 4o1....

Well, if you are the impatient type like me, you might want to grab a snack first before going to diner. And OpenAI is probably going to take the slow and steady route on this one.

The initial version of 4o1will be text-only. That means it won’t be handling images or videos. They are sticking with the basics for now. It seems OpenAI wants to play it safe with this rollout, especially given the rising competition from Google, Anthropic, and all the other kids in the AI playground.

4o1 will be limited to handling text, with no fancy vision or multimodal capabilities like understanding speech or video. This is absolutely a bit of a downer for users like me who are expecting a one-stop AI that handles just about everything.

But, that focus on just text would probably mean that OpenAI is likely trying to perfect the model's reasoning capabilities without overcomplicating things at the start.

And that makes a lot of sense

A model for the few, not the many

4o1 is available to a small group of ChatGPT Plus subscribers.

No word yet on whether this will be an exclusive club or if there'll be heavy rate limits to manage the new load.

As usual... f*** rate limits.

I get it... the planet burns. Read Oracle has commissioned three small nuclear reactors to power its new AI data center

Pricing gymnastics ahead

How much will Strawberry set you back...

Well, it's still anyone's guess. But it definitely will be priced differently than the regular ChatGPT, possibly with rate limits that cap how many messages you can send per hour unless you are - of course - willing to pay moooah.

I just had a flashback thinking of those annoying in-app purchases in mobile games. You get the basics, but if you want the good stuff, better be ready to shell out. This kind of tiered pricing could push more users toward premium options, but it could also drive some away.

Rolling out slowly to avoid the AI apocalypse

Given the complexity of 4o1s new "thinking" feature, the OpenAI will most likely adopt a gradual rollout.

It is all about reducing risks and learning from any hiccups along the way.

Just think of Meta's Galactica or the Rabit R1....

Because when the launch goes sideways, at least only a small group of users will experience the chaos, and not the entire user base.

No more chain of thought chores

One of 4o1s supposed benefits is that it will make complex queries easier. And that is because it is handling multi-step reasoning and not needing a bunch of “chain of thought” prompts from the user.

Because right now, if you want ChatGPT to think deeply, you have to guide it like a GPS for thoughts.

And that is why we have a shady prompt engineering industry, with their paid cheat sheets.

Read: Do not buy prompt engineering stuff

With 4o1, you are supposed to skip that chore. Just ask your question, and it will try to figure out the steps on its own.

No hand-holding required, momma.

Now this WILL be a game-changer for those who don't have the time or patience to babysit their AI, and a death sentence to this shady prompt engineering industry .

Mind the memory gaps

4o1 is also said to be more mindful of past conversations with a user.

Awww... that's so sweat!

That could help it remember preferences that you have or previous topics it has discussed with you earlier. But the early testers have pointed out that this feature still has its bugs.

Sometimes, 4o1 seems to forget what you just told it two minutes ago, much like that boyfriend who always needs to be reminded of your birthday.

In any case, do mind what you tell it in the near future !

At the end of the day, the 4o1 is yet another shiny new toy in OpenAI's toy box.

Sure, it is supposed to bring "human-like" reasoning to the table, but will it be enough to justify the hype and (more importantly) the dreaded price tag?

Only time (and a lot of waiting for it to "think") will tell.

Signing-off Marco

Well, that's a wrap for today. Tomorrow, I'll have a fresh episode of TechTonic Shifts for you. If you enjoy my writing and want to support my work, feel free to buy me a coffee ??

Think a friend would enjoy this too? Share the newsletter and let them join the conversation. LinkedIn appreciates your likes by making my articles available to more readers.

Top-rated articles

TechTonic Shifts

1,930 位关注者

John R.

Founder @ Bloom Co. Capital | Global ?? Fundraiser

4 天前

Britannica Dictionary definition of THINK. 1. : to believe that something is true, that a particular situation exists, that something will happen, etc. [+ object] — often + that. Correct me if I'm wrong here but isn't the point of this AI bot to give you the answers you are trying to solve or create more? When I'm using the GPT's I start by listing out it's protocols. One of these states that "when you give a reply never falsifi information, if you can't provide me with 100% correct data or information needed leave the word placeholder in parentheses as a placeholder." If I'm still rambling on the right track then my next comment is: with just a little elbow grease and "noggin" ?? power you can have your "THINKING" aspect solved by fine tuning. If I'm not on the right track then my comment is well. Shit. Remind me not to "think" outloud Facebook GPT ????

2 次回应

Souren Sch?mburg

"If man realizes technology is within reach, he achieves it. Like it's damn near instinctive. Look at us, for example. We're state of the art." (Ghost in the Shell - 1995)

4 天前

Sascha Grosskopf ??

1 次回应

Ra?ed Awdeh, PhD

LinkedIn Top AI Voice || Digital Transformation Leader || Bridging Technology & Business Strategy || CIO ● CTO ● Advisor ● Consultant

4 天前

Personally I am more "comfortable" dealing with the legacy GPT4 model.

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

OpenAI update: Strawberry is live, the subscription fees, and the hunt for cash

Marco van Hurne

Partnering with the most innovative AI and RPA platforms to optimize back office processes, automate manual tasks, improve customer service, save money, and grow profits.

Before we start!

Strawberry (4o1) AI. Is it a juicy fruit or just a lemon?

But is it worth it?

Does 4o1-preview think a hot dog is a sandwich?

What about a taco? Is that a sandwich?

But is a taco a hot dog?

4o1 can handle more complex, non-sandwich related tasks too

OpenAI’s quest for cash

??

The price of strawberry is more than just a few seeds

领英推荐

Competitors in the fruit bowl

What is next? Patience, grasshopper, patience

A model for the few, not the many

Pricing gymnastics ahead

Rolling out slowly to avoid the AI apocalypse

No more chain of thought chores

Mind the memory gaps

Top-rated articles

TechTonic Shifts

1,930 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Issue #227 - THE ML ENGINEER ??

?? How to Expand LLMs Memory

o1-Preview?—?Everything You Need to Know About OpenAI’s New Model in 2024

GenAI Weekly — Edition 30

Issue #210 - THE ML ENGINEER ??

How to Master OpenAI: A Comprehensive Guide OpenAI is a leading force in the field of artificial intelligence, with its models and tools transforming

OpenAI has o1

Dynamic AI Workflows: Explore the Power of Router Chains in Langchain!

OpenAI’s Q* and Strawberry Leak: AGI on Track?

??Top ML Papers of the Week

Before we start!

Strawberry (4o1) AI. Is it a juicy fruit or just a lemon?

But is it worth it?

Does 4o1-preview think a hot dog is a sandwich?

What about a taco? Is that a sandwich?

But is a taco a hot dog?

4o1 can handle more complex, non-sandwich related tasks too

OpenAI’s quest for cash

??

The price of strawberry is more than just a few seeds

领英推荐

Competitors in the fruit bowl

What is next? Patience, grasshopper, patience

A model for the few, not the many

Pricing gymnastics ahead

Rolling out slowly to avoid the AI apocalypse

No more chain of thought chores

Mind the memory gaps

Top-rated articles

TechTonic Shifts

1,930 位关注者

Gen AI in enterprises - playtime is over

2024年9月18日

Businesses thought AI could write code. now they’re playing whack-a-mole with outages.

2024年9月17日

How Cursor AI, Amazon Q, and GitHub save hundreds of millions in technical debt.

2024年9月16日

AI/ML news summary: Week 37

2024年9月15日

Oracle has commissioned three small nuclear reactors to power its new AI data center

2024年9月13日

OMG! ChatGPT wrote a merciless breakup letter

2024年9月12日

57% of the internet is AI generated and causes model collapse

2024年9月11日

AI based BattleBots!

2024年9月10日

Meet the NEO Beta - the most spectacular, and creepiest robo humanoid!

2024年9月9日

AI/ML news summary: week 36

2024年9月8日

社区洞察

其他会员也浏览了

Issue #227 - THE ML ENGINEER ??

?? How to Expand LLMs Memory

o1-Preview?—?Everything You Need to Know About OpenAI’s New Model in 2024

GenAI Weekly — Edition 30

Issue #210 - THE ML ENGINEER ??

How to Master OpenAI: A Comprehensive Guide OpenAI is a leading force in the field of artificial intelligence, with its models and tools transforming

OpenAI has o1

Dynamic AI Workflows: Explore the Power of Router Chains in Langchain!

OpenAI’s Q* and Strawberry Leak: AGI on Track?

??Top ML Papers of the Week