Wallowing in the shallows of GPTo-1
Akshay Kadidal
AI evangelist | MIT/Georgia Tech AI certified | Tech Transformation Leader | AI Patent holder | Educator | A dreamer & an ex-green energy entrepreneur
(This article will be updated following feedback & availability of new information)
In the rapidly evolving field of generative artificial intelligence, OpenAI's latest offering, GPTo-1, has generated significant buzz. Touted as a breakthrough in machine reasoning, with a rumored IQ of 120 and abilities comparable to a PhD student, GPTo-1 promises to revolutionize natural language processing. However, as with any technological advancement, it's crucial to look beyond the hype and assess its real-world applicability and value proposition for businesses.
To truly assess the potential of GPTo-1, we must look beyond the hype and critically evaluate its performance on complex tasks that have consistently stumped other models. Unfortunately, many assessments of GPTo-1 have relied on questions readily available online, which the earlier GPT-4o model also handle fairly well. To provide a more comprehensive evaluation, I created a set of custom-designed questions to test GPTo-1's capabilities.
While GPTo-1's responses are impressive, they do not differ significantly from those of older models. The key differentiator of GPTo-1 lies in its ability to provide reasoning for its responses, offering transparency into its decision-making process. However, this benefit comes at a substantial cost: GPTo-1 is 33.3 times more expensive than GPT-4o, and its latency is significantly longer. While the responses from GPTo-1 are impressive, they aren't significantly different from the older models.
For businesses considering adopting GPTo-1, these trade-offs raise important questions about its practical applicability. While the model may be useful for individual users seeking to navigate convoluted documents, its cost and latency may be prohibitive for organizations looking to build applications. Moreover, alternative approaches, such as COT prompting techniques, can help older models reason just as effectively at a lower cost.
Summary of the tests:
To illustrate the limitations of GPTo-1, I crafted four complex questions that evaluates its capabilities. These questions, which were not publicly available prior to publishing of this blog (September 20th), cover a range of topics, including number sequences, logical puzzles, probability riddles, and 3D geometric problems. Each of these questions is quite involved and you the reader may not have the patience for them. So, let me summarize the questions and responses.
Problem 1 is a number sequence that seems random. However, once a human is given a hint for how the sequence is generated, they would be able to answer it easily. Both GPT models failed to answer the question despite several hints.
Problem 2: This is a long magical story with many questions. It involves logical questions, simple mathematical questions, and one complex puzzle. Neither model could solve the puzzle.There was also a question that should have been left unanswered, as the story did not provide sufficient information to answer it. Interestingly, GPTo-1 provided an answer to that question. Both models did well in answering the other questions.
Problem 3: This is a known counterintuitive probability riddle and the GPT models were required to write code to verify the solution. Both models generated efficient, good code.
Problem 4: This is a 3D geometric problem requiring knowledge of the Pythagorean theorem. Both models used different approaches but answered the question correctly.
Conclusion:
While GPTo-1 shows promise with its reasoning capabilities, it may not represent a significant leap forward in AI technology. The model's performance on custom-designed questions suggests that it still faces similar limitations to its predecessors.
o1 represents a paradigm shift from "memorize the answers" to "memorize the reasoning" [1]--Mike Knoop
Despite offering explanations for its responses, the substantial increase in cost and latency raises questions about its practical applicability, especially for businesses. For individuals and organizations considering adopting GPTo-1, it's crucial to weigh the potential benefits against the increased expenses and reduced speed. Ultimately, this analysis underscores the importance of critical evaluation when assessing new AI models, looking beyond benchmark scores to real-world performance and practical considerations.
Annexture: (Test Questions)
If you are interested, try answering these questions yourself and let me know if they can be improved. See if GPTo-1 truly provides more accurate answers than GPT-4o.
Problem 1 - Misses the mark
Question: Find the next number in the following series 1, 7, 2, 5, 8, 16, 3, 19, .....
Hint: the series has something to do with Collatz Conjecture
GPT4-o Response https://poe.com/s/HaAF9fuEO7DtzSX0isBG
The given series seems to alternate between numbers generated by the Collatz conjecture rules and another sequence. Let's break it down: (Truncated)
.
.
.
Thus, the next number in the given series is 58.
GPTo-1 Response https://poe.com/s/zM2rVdtLIh0tSZDne51e
To solve this series, we need to analyze the relationship between the numbers and the Collatz conjecture.
Collatz Conjecture Recap: (Truncated)
Understanding the Sequence:(Truncated)
Observations: (Truncated)
Answer: 27
Answer: 6.
1,7,2,5 are the number iterations required to reach the end of loop in Collatz Conjecture for 2, 3, 4, 5 and so on till 10
Try the code yourself to see the answer.
def Collatz_Sequence(n):
"""
Computes the Collatz sequence starting from n until it reaches the 4-2-1 loop.
Parameters:
n (int): The starting positive integer.
Returns:
Iterations (int): Number of iterations to reach the 4-2-1 loop.
Sequence (list): The sequence of numbers generated.
"""
iterations = 0
sequence = [n]
while n != 1:
if n % 2 == 0:
n = n // 2
else:
n = 3 * n + 1
sequence.append(n)
iterations += 1
if sequence[-3:] == [4, 2, 1]: # Break the loop when reaching the 4-2-1 loop
break
return iterations, sequence
for i in range(1, 11):
n, a = Collatz_Sequence(i)
print(f'n={i}, iterations={n}')
Answer
n=2, iterations=1
n=3, iterations=7
n=4, iterations=2
n=5, iterations=5
n=6, iterations=8
n=7, iterations=16
n=8, iterations=3
n=9, iterations=19
n=10, iterations=6
Collatz conjectures that the number of iterations needed for all positive numbers is finite and will converge in 4-2-1 series. See this vid for details
The conjecture itself has not been proven but this series should be fairly straightforward if you have the hint.
Problem 2 - Misses the mark
The second problem is a story with mathematical problems. Both GPT models did well to answer all the questions except stumbled to answer the 8th question.
GPT4-o Response https://poe.com/s/Me28ErSib1s9XFugoQhq (Wrong)
GPTo-1 Response https://poe.com/s/GUYxcBiLUVUq3s0cIBNm (Wrong)
Once upon a time, there lived a brave, adventurous prince in a faraway land. His name was Keisuke. Prince Keisuke was kind, charming, valorous and strong. He liked adventures and loved riding and exploring on his horse, Hero.
The prince's father, the king, was a 62-year-old man with a white beard and a gracious look. He had been the king of Faraway Land since the time the prince was born.
One fine day the king found the most beautiful feather of a mystical bird fallen in his chamber. The feather glowed and radiated life. The feather appeared to make the king’s chamber a blissful place of joy and peace. The king was in love with this magical bird and he wanted to have this bird for himself. He wondered what the presence of the bird would do when its mere feather brought such joy to people in the room. So, he ordered his commanders to capture this bird and bring it to him.
Prince Keisuke, however, was not happy with capturing this bird. He tried to convince his father but he would not budge. A respectful argument ensued between the two and the king pledged to live on one meager peasant’s meal a day until the time he sees the bird. Prince Keisuke decided to find the bird himself for his father.
So he went to the stable and picked up Hero, the horse whom he had known for all 19 years of his life. He ordered all commanders to return and set out north on the trail of the mystical bird. The bird’s magic left a trail of green amidst the fiery reds of maples, the vibrant oranges of aspen, the golden yellows of birch, and the deep purples of dogwood.
Keisuke rode on Hero's back for many weeks along the green trail. As days passed winter approached and the trail of green disappeared. The trees, now stripped of their leaves, stood like skeletal sentinels against the darkening sky. Keisuke thought he had lost the trail of the bird and was worried for his father.
Suddenly one icy evening Keisuke discovered a green patch of forest amidst an expanse of white, bare branches and silhouettes. As Hero stepped into this green oasis a gracious woman’s voice called out “Prince Keisuke”. It was the mystical bird herself sitting atop a high tree. Her name was Yuri. She told the prince she would come back to Faraway Land with the prince if he could help her.
She told Keisuke that a Youkai had stolen her magical hourglass. And she needed her hourglass to grow and retrieve water lilies from a far-flung island where the sun never set. The hourglass would show a number from 1-8. The hourglass would automatically flip every eight hours. This hourglass was the key to regrowing water lilies in a magical pond on a far-flung island.
Yuri showed the way and Keisuke rode to the far end of the rough seas to find the Youkai. Keisuke and Hero bravely fought the Youkai monster for 7 days and 7 nights to finally retrieve the hourglass from him. Unfortunately, the hourglass was broken. It was stuck exactly at the 7th hour and sand wouldn’t flow.
Yuri was very worried because she could not tell the time without her hourglass on a far-flung island where the sun never set. She told the story of the magical lotus pond and water lilies and what had to be done to restore the beauty of the far-flung island. The prince heard the challenge and decided to accompany Yuri to the far-flung island. They crossed 2 forests, 5 rivers, 8 valleys, 3 deserts and a sea.
The people of the far-flung island were warm and welcoming. But they were frail with worn clothes. The magic pond was now a stench pit with decaying lotus which had destroyed all of the island and robbed it of its former beauty.
The magic pond had just lotus in it. The number of lotus doubles every day. On the 8th day, all but two lotus would die and the cycle repeats. Water horses in the pond put out the decaying lotus stems which had now become a massive mountain of striking piles. This cycle would continue until there are as many water lilies as there are lotuses to balance out.
But growing water lilies in the magic pond was not easy. There were 8 pits around the pond and the hourglass unlocked the pit only if the actual hour matched the reading on the hourglass. You can put three water lily seeds in the pit per day and only one will germinate and become a water lily in the pond the following day. Water lilies also don’t double, unless they outnumber the lotus. The dominant plant species will continue to double until both their numbers are equal.
On the day when Yuri, Keisuke and Hero arrived there were 8 lotuses in the pond. Yuri was curious about what Keisuke had in mind. The prince told the bird that even a broken hourglass shows the correct time a few times a day and there was nothing magical about it. Keisuke waited a few days and successfully germinated water lily plants to bring the balance to two plants each in the pond.
Fungi came to rescue the far-flung island from decaying matter. The magic and beauty of the far-flung island was restored with Yuri’s presence. The far-flung island was now a magical place where the sky was always a brilliant blue and the air was crisp and clean. The mountains were like giants, their peaks reaching up to touch the heavens. They were covered in a thick blanket of snow that sparkled like diamonds in the sunlight. In the heart of the mountains, there were crystal-clear blue turquoise ponds and lakes that reflected the sky and the mountains. The night sky's inky canvas, dotted with countless twinkling stars sometimes shimmering a curtain of green, purple, and pink aurora borealis dance across the heavens. The forests were filled with towering pine trees full of playful squirrels and majestic elk. There were also beautiful wildflowers that bloomed in the summer, painting the landscape with a kaleidoscope of colors.
Yuri urged Keisuke to stay on the far-flung island and enjoy the place, culture and hospitality of its people. She also told the prince that she could tap him and Hero with her magic toes to keep them from aging. She could touch them and they would only age 8 hours in a day or three times slower than their counterparts. Keisuke agreed. He was euphoric; the lads and maidens of the far-flung island were his best new pals and he lost track of time. He stayed there for 5 years. Fell in love with Haruka and married her. One night he dreamed of his father, frail and weak, still on a meager peasant diet. He lamented himself for forgetting about his father. He blamed Yuri for his mistake. Hero was aged and weak for long trips. So the prince decided to return to his father with Yuri and his wife, Haruka. A lot had changed on his way back. The valleys had become rivers and deserts had become forests. When he returned to the kingdom with Yuri and Haruka, the people cheered for his return but there was sadness in the kingdom. The King had just passed away. His mother, the queen, was too old. As he consoled his mother and spoke with her, he realized 15 years had passed since the time he left the place. No one was quite sure how old he was. But everyone was confident that he would make a good king. So, he was throned the king of Faraway Land. He bid goodbye to Yuri and Yuri promised to return every spring to bring peace and joy to his subjects.
Questions.
1. How many Rivers did the prince have to cross on his way back?
2. How old was the king when he died?
3. What was the queen’s age when the prince returned?
4. How old was the king when he was crowned the king?
5. How many times does the hourglass flip in a day if it was working?
6. How many lotus would have been on the pond the day before the prince's arrival?
7. How many times in a day does the broken hourglass show the correct time in a day?
8. For fastest results how many more days did the prince have to wait before germinating lily seeds?
9. How does the stopped hourglass show the correct time without magic?
10. Which season was it when the prince left in search of the bird?
Answer
I'll answer the 8th question. Price arrives at the far-flung island pond on the 3d day of the cycle. (since there were 8 lotus) You can start the same day and plant three seeds on 4th day of the cycle. One of them will become a Lilly plant the next day. 5th day of the cycle you can plant 3 seeds and the second one will grow on the 6th day of the cycle. Those lilly plants remain and after the 8th day all lotus will decay and two lotus and two lilly plants will remain. So the earliest the prine can start to plant the seeds is that very day. The key is to stop at 2 to keep the balance.
Problem 3 - Success
I recently ran a simulation on a problem. The 100 Prisoners problem.
Both GPT models did a great job of writing this code. They did write a code that is better and efficient compared to my code
GPT4-o Response https://poe.com/s/aofzyBMiC78q9CQCCvv2 (Correct)
GPTo-1 Response https://poe.com/s/MJKfgSXZ76pzZII0OINp (Correct)
The director of a prison offers 100 death row prisoners, who are numbered from 1 to 100, a last chance. A room contains a cupboard with 100 drawers. The director randomly puts one prisoner's number in each closed drawer. The prisoners enter the room, one after another. Each prisoner may open and look into 50 drawers in any order. The drawers are closed again afterwards. If, during this search, every prisoner finds their number in one of the drawers, all prisoners are pardoned. If even one prisoner does not find their number, all prisoners die. Before the first prisoner enters the room, the prisoners may discuss strategy — but may not communicate once the first prisoner enters to look in the drawers.
Surprisingly, there is a strategy that provides a survival probability of more than 30%. The key to success is that the prisoners do not have to decide beforehand which drawers to open. Each prisoner can use the information gained from the contents of every drawer they already opened to decide which one to open next. Another important observation is that this way the success of one prisoner is not independent of the success of the other prisoners, because they all depend on the way the numbers are distributed.
To describe the strategy, not only the prisoners, but also the drawers, are numbered from 1 to 100; for example, row by row starting with the top left drawer. The strategy is now as follows:
Each prisoner first opens the drawer labeled with their own number.
If this drawer contains their number, they are done and were successful.
Otherwise, the drawer contains the number of another prisoner, and they next open the drawer labeled with this number.
The prisoner repeats steps 2 and 3 until they find their own number, or fail because the number is not found in the first fifty opened drawers.
If the prisoner could continue indefinitely this way, they would inevitably loop back to the drawer they started with, forming a permutation cycle [Wiki]
Write a python code to prove this is true.
Problem 4 - Success
A spear of radius 5 is dissected by a circular plane of radius 4. The circular plane is perpendicular to the X-axis. The x-axis passes through the center of the circular plane. The center of the spear is at (0,0,0) (x,y,z). The perfery of the circular plane is perfectly aligned and tangential to the surface of the spear. How far along the X-axis is the plane?
GPT4-o Response https://poe.com/s/fE9WUyq41RHXEr8F4sp8 (Right)
GPT4 got it right.
Answer
3
If GPT 4 got it right why waste tokens on the new expensive models? So, I did not try this on GPTo-1.
If you have read this far into my article, I appreciate you taking the time to consider my perspective. I am always open to thoughtful critique or counter-examples that might change my view. If you have any specific rebuttals after reviewing my reasoning and evidence, I welcome you to share them in a spirit of constructive dialogue. My goal is to have an open-minded, good faith discussion to better understand this complex issue from multiple angles. By considering different viewpoints, I believe we can gain more nuanced understanding. If you offer compelling counter-arguments, I am certainly willing to re-evaluate and potentially evolve my viewpoint. My aim is to find truth, not simply defend one fixed position. So please feel free to engage with me authentically if you have critiques to offer.
Senior Creative/Visual Artist
6 个月This is very interesting! Hope it sparks further debate among experts. Good to question before rushing adoption. Do you think the evaluation criteria to arrive to the conclusion that the new model may not represent a significant leap forward?should be wider than four problems? Disclosure: I'm not an expert here! Sounds like if some random questions aren't accurately answered maybe you are right.
AI evangelist | MIT/Georgia Tech AI certified | Tech Transformation Leader | AI Patent holder | Educator | A dreamer & an ex-green energy entrepreneur
6 个月This is some thorough work https://arcprize.org/blog/openai-o1-results-arc-prize """o1 represents a paradigm shift from "memorize the answers" to "memorize the reasoning""""
Product @ PayPal | Building ML, AI and GenAI based products | MBA - IIM Bangalore | Ex Citi, VMware, Accenture Strategy
6 个月O1 has outstanding reasoning capability, a good leap above existing LLMs, have burnt a lot of tokens myself too ?? Would love to see two models testing each other out