So, what is reasoning in AI?

So, what is reasoning in AI?

Unless you’ve been living under a rock, you would have heard about DeepSeek AI this past week. AI is all the rage at the moment, and this company is the latest darling.

DeepSeek released an open source “reasoning” model called R1 that outperformed OpenAI’s o1 model at several benchmarks, all the while claiming to have achieved that breakthrough at a cost of about $5.6M using inexpensive chips. That sent the US stock market into a frenzy, wiping out $1T of value -- specifically punishing big name chip makers like 英伟达 , whose shares dropped 17%, wiping off $600B in valuation … talk about a rough day!

(Keep in mind these performance results are heavily self-reported, and the media has been spending the week pulling apart the $5.6M cost claim, so who knows what the real number is.)

Anyways, regardless of whatever dance the financial markets do, there is still a fundamental question that many people don’t know the answer to: What exactly is AI reasoning anyways? So, as with any question nowadays, who better to ask than AI itself?

Hey ChatGPT:

When you say you can reason, what exactly does that mean?

It “reasoned” for 55 seconds, and gave me a surprisingly concise answer:

When I (as a language model) say I can “reason,” it does not mean I experience human-style consciousness or possess an internal monologue. Instead, it refers to a specific kind of problem-solving and pattern-recognition process rooted in statistical and algorithmic operations on text. Below is a more detailed breakdown of how this process works:

Not too shabby, eh?

Breaking it down in plain English

To understand what reasoning means, you have to understand how LLMs work.

LLMs are really just big prediction engines. They have trained themselves on billions and billions of texts -- documents, articles, books, websites, etc. -- and have built mathematical models that essentially predict the next most likely word from a given input. All the AI stuff essentially boils down to that simple premise, because pictures, video and audio can all be represented (in some way) as text. So send over some stuff, have it broken down into text, and it will give you a bunch of words back that statistically are most likely to appear together based off the models and texts it has trained on, and then put it back into the format you want.

(This isn’t meant to be a theological debate, but you could certainly argue that humans don’t fundamentally learn much differently ... but I’m just sticking to the math at the moment.)

The current iteration of LLMs we’ve been using for a few years perform this function in a way that's referred to as chain-of-thought (CoT) prompting. It's like having a conversation with another person: Ask a question, get a response, seek a more thorough answer, refine the question, ask it again, repeat, repeat, repeat. (Fortunately, computers don’t get annoyed by being asked the same question 500 times. And they have read every book on the planet and haven’t forgotten any parts of any of them, so they are know-it-alls, to boot).

“Reasoning” is just that process super-charged. Instead of working in a CoT, these models are built to pause at various parts of the process (i.e. “reason”) and use the answers derived along the way to feed back into the loop and help refine the answer it ultimately gives. So it is not so much a linear process, but a bunch of processes working in parallel where the answer then reinforces the question. There is a ton more math involved, but that’s the TL;DR.

As you can imagine, doing all of this requires a crapload more training (because it isn’t so much about individual word associations anymore) and A LOT more computational power. Like 10x more, to put a number on it. That's why the DeepSeek breakthrough was such big news. If someone could figure out a way to do all of that without buying a billion dollars of Nvidia GPUs and putting them in a billion dollar datacenter, it's kind of a big deal.

Giving it a whirl

Here's a fun game to play if you want to understand “reasoning”. Fire up your favorite AI reasoning chatbot and ask it this question:

How many words are in your response to this question?

Now, if I asked another individual this question, they would definitely question my sanity. It's more of a word puzzle than it is a question, but it's a puzzle that most early LLMs got hung up on because it requires a bit of reinforced learning … meaning you would have to form the answer, and then count the words in it, and modify the answer again to actually give the correct answer. Get it? ??

I’ve tried this game before on OpenAI’s models and I think they patched them up to answer it correctly, so I decided to see how DeepSeek handled it this time around. For the moment, it is free, and OpenAI o1 requires a Pro license, so let's try this on the cheap.

Along with the aforementioned cost innovations with DeepSeek, it also has a subtle UX innovation: It shows you its reasoning, it all its glory, while it is working. o1, by contrast, hides the detailed reasoning, and while you can expand it to see what it did at a high level, you don’t get the granular details. DeepSeek has decided to expose this:

Okay, the user is asking, "How many words are in your response to this question?" Let me break this down. First, I need to figure out exactly what they're asking for. They want the word count of my entire response to their query. Wait, but when they say "response," does that include just the current answer I'm generating, or all the previous interactions too? Probably just the current answer because the question is in the present tense. So, my task is to compose a response that not only answers their question but also provides the word count of that very response. That's a bit meta. Let me outline the steps.






Final Jeopardy answer time (drumroll please):

Which is correct! At least from my perspective … and it only took 172 seconds to do it ??

Unpacking this query

I absolutely love the fact that DeepSeek gives you the full outline of its reasoning. I think this goes a long way to building trust in the tool, because you can see how it's formulating its response, the assumptions it is making, and the processes it goes through. It may not be the slick solution you want in some scenarios, but in this early-days-research-playground phase, it is great.

As you can see from the start, it first tries to understand the context of the question:

Okay, the user is asking, "How many words are in your response to this question?" Let me break this down. First, I need to figure out exactly what they're asking for. They want the word count of my entire response to their query. Wait, but when they say "response," does that include just the current answer I'm generating, or all the previous interactions too? Probably just the current answer because the question is in the present tense.

I started a new chat for this question and my question was the only thing in it - so, bingo, it got the context correct. If I was 30 questions into a session, or had provided more context within the prompt, maybe it would have gone in a different direction, but this one was the right one.

Next, a bit of cheekiness, which I love seeing:

So, my task is to compose a response that not only answers their question but also provides the word count of that very response. That's a bit meta. Let me outline the steps.

“That’s a bit meta.” I feel like I’m talking to one of my kids!

It then begins to breakdown the question and reason its way through an answer. I won’t go through it all (since it took 172 seconds and you can read it line-by-line above) but the core concepts it tried to reason its way through were:

  • It has to come up with a response to the question.
  • It has to then count the number of words in the response.
  • It then has to take that count and put it back into the response.
  • It then iterates through multiple responses to the question, and does the same process for each, to get the final answer it gives me.

It is fascinating to watch it go through this process. I love how it figures out that there are multiple valid responses to the question (shorter one, longer one, etc.) and that it has to determine if it has to count the numerical answer as a word, or keep it as a number (it eventually chose the former). All of this happens in some sort of quasi computer “stream-of-thought” on the screen. It's a bit surreal to watch.

If you think about it, it is roughly doing the same thing that I think I would do to answer the question -- probably much more thoroughly, if I'm honest. If someone really asked me that question, I wouldn’t have gone through the multiple answers based off the multiple lengths of responses, and debated the merits of answer length and how to handle a number as a word or not. I would have just given the quick/easy answer and reminded the person asking that it was a stupid question to begin with.

But obviously these AIs don’t have that level of consciousness or awareness of human interactions and idiosyncrasies. They are just solving math problems, really fast, from a wide array of answers they've learned. This is a goofy example, but when you do other, more interesting things, you realize it is pretty cool stuff.

Finding applications in travel

So, the next obvious question (for me at least) is how can this be used in travel?

For anyone that knows me, you would know that I have an odd fascination with the inability of booking engines to provide reasonable options for common situations. The best example is for a good old-fashioned day trip, when I leave home (Boston), have a meeting, and return back in the same day. Pretty much every booking engine fails to give reasonable options.

Since it requires a bit of brains to properly answer this question, let's see how DeepSeek does:

If you are a business traveler who lives in Boston, would it be possible to do a day-trip to Austin, Texas for a meeting?

(Austin is usually the city I use for this example, not only because of the famous scene from the movie Road Trip, but because of the great people I work with who live there: Daniel Senyard , Rey Garcia , Layne Cassidy , Geoff S. and Brett Copeland , to name a few. They also have pretty good barbecue that is close to the airport on the way back.)

I was very pleased to watch DeepSeek reason its way to an answer:

Okay, so the user is asking if a business traveler from Boston can do a day trip to Austin, Texas for a meeting. Let me start by thinking about the logistics involved here. First, the distance between Boston and Austin is pretty significant. I know they're both in the US, but Boston is on the East Coast, and Austin is in Texas, which is more central/south. So flying is the only practical option here.

Final Jeopardy answer time (drumroll please):

Yes, a business traveler from Boston can feasibly do a day trip to Austin, Texas, for a meeting with careful planning, though it will be a long and tightly scheduled day. Here's a breakdown:

Which is the answer I was looking for! Yes, it is possible, but it is risky, and you’ll have limited time on the ground even in perfect conditions, so buyer beware.

As a frequent traveler, I intrinsically know this answer, but most booking engines do not. Seriously, try searching a RT Boston <> Austin day trip on Expedia, Google or even an Airline.com, and see for yourself. You’ll get a list of flights back that make logical sense (i.e. they are valid tickets to purchase), but not reasonable sense (i.e. unless your meeting is for 30 minutes at gate A32 in the Austin airport).

Why? Because booking engines are geared towards selling you stuff (or in corporate, selling you stuff the company approves of) but are not really geared towards giving you enough time for your meeting. Sure, there are fancy ones that plot it out step by step, or use your preferences to prioritize the results, but they are usually not reasonable.

It is the same flaw with hotel searches that send you to a property 20 blocks away in Manhattan when there is a much closer one a block away, but is $10/night more. They can’t understand that 20 blocks of travel isn't usually worth saving $10/night. Logical and reasonable options are very different things, and most tools haven’t figured that out yet ... which is why people who take these kinds of trips (like me!) still use travel agents.

Conclusions

It is still early days with these AI reasoning models, but I expect them to get better and faster as the weeks and months roll on. What everyone thought was going to be ridiculously expensive to do was already proven wrong with DeepSeek (or maybe it wasn’t, who knows) and it should only get more affordable from here. DeepSeek is open source and there are already hundreds of models being built off its core, which all happened in about 10 days. Everyone can access this stuff, too, which I think is the true potential of AI.

In travel, logical vs. reasonable has been an age-old debate. Lots of trips sound good on paper, until you actually have to do take them. Logically, the fastest way for me to get to Singapore is to take the direct flight out of New York, but I’ve gotten sick both times I’ve done it (true story). So while that is logical, maybe it isn’t reasonable to sit on a plane for 19 hours and expect that a person can land, grab a shower, and function normally.

Expect booking tools and other travel recommendation solutions to start embedding these reasoning capabilities. The UX will take time to adapt, as much of it still takes real processing time (in minutes) to perform it, and people still expect instant responses nowadays or assume it's broken. So I imagine first movers will be using it behind the scenes to smarten up results post-transaction, then find ways to feed them back to the user to perhaps change what they picked before they travel.

What I’m most excited for are models that can be trained specifically on YOU. I’ve done hundreds of trips, and you can probably infer what is “reasonable” to me by comparing the options I ultimately selected to those I didn’t. In my choices, I have a set of rules I’ve built in my mental model based on travel schedules, content availability, personal convenience, and, yes, cost. Not all of these show up in a travel profile.

But imagine an AI reasoning agent that understands your mindset from your past behaviors and perhaps questions it asks along the way, then applies that to the 200 options that the systems spit out, to give you the one that is the most reasonable for you. Wouldn’t that be cool?

It's almost as if we’re working on that already ... ??

Owen Warne

Building Boston's Cross-Functional C-Level Community

1 个月

This is brilliant. I haven't used DeepSeek yet but as a parent in a family of 5 having AI that can reason based on what it knows about me and my family (we're a neuro-spicy bunch with all the quirks that come with that) and can help us figure out options for days out, trips etc would be epic.

回复
Daniel Senyard

AI Travel Tech since 2014 | Innovation, AI, & Tech Commercialization at Flight Centre | Former Founder & CEO of Shep (acq.) | BTN & Skift Innovator Lists | Serial Founder & Advisor

1 个月

Let me know when our BBQ lunch meeting is!

Mark Waters

Senior Account Manager at FCM

1 个月

Insightful read John Morhous thanks for taking the time to share all your insights- great ??

itraveledthere.io AI fixes this AI and travel booking challenges.

回复
Ole Hammer Mortensen

Partner at AMMconsulting.dk - Consult: Travel & distribution I B2B I B2C I LEAN I TECH I Travel Management | Keynote speaker I Sustainebility | Operation |

1 个月

John great article and tks for taking the time to act and explain Best Ole

要查看或添加评论,请登录

John Morhous的更多文章

  • Previewing OpenAI Operator

    Previewing OpenAI Operator

    Last week, OpenAI introduced the much anticipated Operator feature. The concept for how it works is relatively…

    23 条评论

社区洞察

其他会员也浏览了