登录查看更多内容

A language model is not enough

Paul Haley

AI for Good

发布日期: 2024年6月3日

I am interested in using AI to help people learn.? With recent advances, I wondered how much closer we were to off-the-shelf tutoring using language models. In what follows, it’s clear that current language models know enough to be fantastic tutors, but it’s also clear that they are not very good at reasoning, staying on task, or otherwise pursuing purposeful dialog.

Some think we’re on the verge of super-human artificial intelligence, known as AGI for artificial general intelligence.? Well, I thought I’d take one of the better AI models for a spin on some pretty simple instruction.? As a rusty pilot, I thought I’d brush up by chatting with one of the latest and greatest language models.

I told the AI that I am a private pilot needing to be well-versed in the federal aviation regulations, known as the FAR.?The AI told me what it takes to earn a private pilot certificate.? OK, but not exactly responsive or helpful in mastering the FAR.? It would have been better to recognize that I already told it I was a private pilot. Thankfully, it brought up Part 91 which is what I wanted to review.

I asked to focus on part 91, which it informed me is “General Operating and Flight Rules”.? Exactly what I had in mind.? It generated a helpful outline overview and picked out few seemingly random subparts having to do with speed limits, alcohol and drugs, and drone operations, saying I should pay special attention to them.? That didn’t make sense to me.

Annoyingly, at every turn it tacked on some good citizenship stuff which added little value. For example, things like “Remember...”? and cautions to keep up to date and consult the complete document kind of stuff.? Also, a variety of “Let me know if you...” have more questions, etc.

I asked it to ask me some questions on part 91, which it did nicely.? That’s what I was looking for: a conversation where it helped me master the material.? Unfortunately, it asked three questions at once! The questions were each somewhat open-ended, at least allowing for multiple possible answers. They were not specific enough and one was ambiguous. One could try to guess which of multiple possible answers the AI was looking for or engage in more open-ended dialog, but I was looking for a more focused and productive review session.

To reduce some of the ambiguity in its questions, I asked it to focus on visual flight rules (VFR).? It did a nice job telling me what VFR entails, which was not productive. (Verbosity, if not bloviation is a common problem with language models.) It then asked an ambiguous question about minimum visibility requirements . I asked it again to focus on VFR, mentioning that I’m already instrument rated.? It welcomed my credentials, cautioned me to say sharp on VFR, and asked a better, more closed question about minimum visibility required for day operations in VFR outside of controlled airspace.

I gave it the answer and received congratulations.? Nice!? Then I got another one of those “let me know if I can do anything else for you” things, and another prod by “the good citizen” to stay current.? Ugh. It’s an AI, not my mother (or Uncle Sam), right?? Folks are taking this safety stuff too far...

So, I had to ask it to keep quizzing me.? It told me again what VFR is.? Ugh.? Then it asked for minimum visibility for takeoff or landing at an airport in controlled airspace.? Good!? I said a nautical mile, which it corrected, both in number and unit.? Excellent!

Again, I had to prod it to continue quizzing me.?Previously, it ended a response with a continuation question, but here it just stopped cold.? It consistently lost track of the task or instruction it had received.

It told me that the forthcoming question addressed some “important aspect” (another little bit of apple pie) and asked about horizontal clearance from clouds under day VFR.? Good!?

I answered 500 feet which it appropriately indicated wasn’t quite right.? Very nice.? It gave me the correct answer and told me that I answered with the minimum vertical separation.

Again, I had to prod it to continue, and it asked me the question I had just answered (about minimum vertical separation)!? Not fatal, but not productive. I gave it the same answer and it congratulated me for being “exactly right!”? It said, “excellent job” and asked again if there was anything else (again!).

I emphasized that I needed to master the whole thing and asked it to continue.? It then asked me for an important aspect of VFR planning covered by FAR 91.103.? A human being would realize that almost no one could recall what regulation 103 might concern.? And there are many aspects of planning for a VFR flight, many of them important.

So, I told/asked it, “There are several, right?”, meaning important aspects of flight planning, not limited to regulation 103.? I gave it 2 examples, concerning fuel and landing sites.? Turns out, I was right on the money with 91.103(a).? It acknowledged these as important and added weather and airspace.? It too was also right on the money, but it added NOTAMs (notices to airmen) and the AIM (aeronautical information manual) which are not mentioned in 91.103.? It concluded its response nicely, without platitudes, but also without asking anything further.

So, I asked it to continue quizzing.? Here it asked me to identify instruments required for VFR.? Unlike prior questions with multiple answers, it acknowledged there were multiple but asked for any one of them.? Good!

I told it a compass and airspeed and altitude indicators are required.? It acknowledged the indicators, implying I was somehow not quite right with “you’re on the right track”. ??And, thankfully, it asked me for the acronym used to remember basic VFR instruments. ?Perhaps it was beginning to attend to my repeated requests to continue!?

I indicated I didn’t know and it told me about the acronym “ATOMATOFLAMES”.? Unfortunately, it messed the explanation up, not too badly but materially.? It output only 10 bullets in an odd order. ??I was confused by its output for the oil and manifold gauges, so I asked when they were required.

领英推荐

Unraveling the Frontiers of Knowledge: New Research in…

NetAnalytiks 1 年前

The Future Trajectory of Large Language Models (LLMs)

XenonStack 1 年前

Large Language Models: an update for the perplexed

Zifo Data Science 1 年前

It output the acronym again, dropping the initial ‘A’.? It also dropped one of the oil gauges in this answer but its explanation of the other made sense.? It’s explanation of the manifold pressure gauge requirement helpfully defined the term ‘altitude engine’ but was not quite right.

So I asked whether a manifold pressure gauge was needed if the plane had a constant speed propeller even if it had no turbocharger or supercharger (which can maintain power at high altitude).? It realized my point and apologized.? Then it gave a bit of a long-winded explanation, which I clearly already knew.? To its credit, it pointed out that the FAR did not specific state the requirement!? It wrapped this turn up pretty nicely, but asked me another “is there anything else” rather than quizzing me.

I followed up, asking if it was indeed the case that the FAR missed the need for a manifold pressure gauge given a constant speed propeller, and it nicely confirmed the omission as well as the critical importance thereof.? Regrettably, again, if left the dialog standing cold.?

I asked it for a picture of a manifold pressure gauged and it obliged.? I asked for a picture of a dashboard containing one and it again complied.? Very nice.? Multimodal AI is coming along... It even explained its position near other engine performance instruments for easy reference!

Changing gears a bit, I asked if the emergency locator transmitter (ELT) needs certification every 2 years.? It nicely confirmed and cited the applicable FAR 91.207(a).? Very nice.? It threw in a bunch of stuff on details that could have been omitted. Worse, it left the dialog cold again.?

I asked if any of the gauges require certification and said, “no”.? It messed up some of the details in a long-winded response which was nonetheless generally quite good and accurate.? It wrapped up without platitudes and without offering to continue, which seemed OK given my direct question.

In its long-winded response it mentioned pre-flight checks.? That got me thinking about some things related to other gauges.? So I asked it, “What pre-flight checks are required in order to ensure accurate instrument readings?”.? It generated a long response.? It contained useless (at least unproductive) generalizations and platitudes more than useful information.? It came up with some creative ideas for checking instruments which were interesting and somewhat misguided. For example, checking that the airspeed indicator works on takeoff could be too late.

An airspeed gauge works by comparing the air pressure ramming into the end of a pitot tube against the pressure of a static port which is more or less impervious to airspeed changes.? Thus, it’s important to check that no bugs or debris are in the pitot tube during pre-flight checks.? I asked it about that.? It again acknowledged and apologized, told me a bunch of stuff I obviously knew, and concluded with “nice catch!” So, I asked about the static ports, too.? It responded similarly, but even more verbosely, congratulating my “comprehensive understanding of the airspeed indicating system”, after telling me all about it!

The language model "knows" enough about this stuff, which is impressive. It just doesn't "think" well or practically, which is imperative. Hence the title...

I noticed that it had pointed out possible consequences of blocked static ports but not that altitude readings could be impacted, so I asked about that.? Again, it acknowledged the point and apologized, concluding with a complement on my “solid understanding of the pitot-static system".? Great.

Skipping forward a bit in our conversation, we came upon the specifics of pre-flight checks being covered in the pilot operating handbook for the aircraft.? The POH is required on-board the aircraft, along with other things which I asked about.? It appropriately brought up the mnemonic ARROW and went through the initials.? I was surprised that a radio license was among the requirements, so I asked it to focus on airplanes (e.g., excluding balloons) and whether they require a radio license.

Again, it acknowledged and apologized.? It explained its mistake as a “nuance” and continued to point out some exceptions.? So I followed up with whether ARROW should have one R.

To my surprise it suggested the acronym could be AOW and appended a bunch of redundant and superfluous stuff to its wordy response.? It “wisely” suggested that it’s worth keeping the R in the acronym in the event of international flight.? As you can see, though, it dropped both Rs!? So asked about that, saying “What’s the other R for in ARROW?”

Ugh.? It said radio license and repeated stuff already covered in the last moment or two.? It forgot all about the registration!? So, I asked it to spell out ARROW again...? It did so nicely, after which I said, “So, the other R is registration?”

Again, it acknowledged and apologized.? But it’s output was mixed up, with some omissions and letters with the wrong meanings.? Then it repeated the AOW acronym and expressed appreciation for pointing out its mistake saying, “I am still under development."

I pointed out that it had messed up the Rs and O, which it acknowledged and apologized for.? This was now a losing game.? It introduced new acronyms and confused AWO for AOW.

Knowing I was nearing the end of a losing game, I asked for a summary of the parts of the FAR covered.? It did a reasonable job and recommended further study.

要查看或添加评论，请登录

Paul Haley的更多文章

A Society of Minds

2018年4月24日

A Society of Minds

This TED talk by Kevin Kelly is a welcome, calm look forward about the impact of AI. Although it was presented in 2005…

4 条评论
Translate a question into logic!?

2018年2月26日

Translate a question into logic!?

This simple article could test or hone your ability to understand English by helping a machine correctly interpret the…

1 条评论

A language model is not enough

Paul Haley

AI for Good

领英推荐

Paul Haley的更多文章

社区洞察

其他会员也浏览了

Anthropic's Claude 3: Pioneering the Future of AI with Advanced Language Models

Navigating the Future of AI: Introducing Multilevel Large Language Models

?? Large Language Models and AI Transformation ?? (Free LLMs Advisory Consultation Offer)

The Perils of Language Model Hallucinations

The Next Leap In AI: From Large Language Models To Large World Models?

Small Language Models: Making AI More Accessible and Efficient

Safeguarding AI's Future: Why Rigorous QA Testing of Large Language Models is Non-Negotiable

Techniques to Fine-Tune Large Language Models (LLMs)

The AI Vanguard Newsletter: Issue #1 - Cutting-Edge Research and a Path To Personal Growth

How Meta's Self-Taught Evaluator is Changing the Game for Large Language Models

领英推荐

Paul Haley的更多文章

A Society of Minds

Translate a question into logic!?

社区洞察

其他会员也浏览了

Anthropic's Claude 3: Pioneering the Future of AI with Advanced Language Models

Navigating the Future of AI: Introducing Multilevel Large Language Models

?? Large Language Models and AI Transformation ?? (Free LLMs Advisory Consultation Offer)

The Perils of Language Model Hallucinations

The Next Leap In AI: From Large Language Models To Large World Models?

Small Language Models: Making AI More Accessible and Efficient

Safeguarding AI's Future: Why Rigorous QA Testing of Large Language Models is Non-Negotiable

Techniques to Fine-Tune Large Language Models (LLMs)

The AI Vanguard Newsletter: Issue #1 - Cutting-Edge Research and a Path To Personal Growth

How Meta's Self-Taught Evaluator is Changing the Game for Large Language Models