I’d Like a Chat…
What a six months it’s been –?AIs?and?chatbots?have hit the headlines and hijacked the zeitgeist!
It’s hard to go a week without seeing a spruiker or a detractor (or several of each) claiming everything is about to change for the absolute best or the absolute worst.
And the “silicon entities of the hour” seem to be?ChatGPT?and?Bard,?Large Language Model?(LLM) AIs which are causing quite a stir with their convincing responses to plain English prompts.
I’ve been around IT long enough to be able to say with certainty that AIs are nothing new.
While my exceptionally switched on IT teacher in the 80’s was focussed on the integration of various devices such as phones, computers, cameras, TVs/VHSs, etc. into multi-purpose devices (which we now carry in our pockets), AIs were already a “buzzy” topic of conversation in IT circles.
But the running joke has been that “true?AI?is only five years away” – no matter?when?you ask.
Now, between image generators like?DALL·E?(which came to prominence in 2021), website and call centre chatbots, and now?LLMs, it seems like the five years is up, and the “Age of True AI” is upon us…
Or is it?
OK, I admit it – I’m one of the skeptics…at least in relation to the utility in?strata?(and many other technical fields) of currently available LLMs. But let’s step back a little.
AI, “Artificial Intelligence”, is a loaded term – the first part is really easy, but just what are we going to claim or accept as an “intelligence” born not of flesh.
Personally, I prefer?AGI: “Artificial General Intelligence”, which, according to?Wikipedia, is “a?hypothetical?intelligent agent which can understand or learn any intellectual task that human beings or other animals can” (my emphasis). I can live with that definition.
And, based on that definition, what we’re seeing with?DALL·E,?ChatGPT?and?Bard?is anything?but?such an “artificial general intelligence” – it’s not even a semblance of a basic intelligence!
These latest chatbots?seem?convincing, sure – but all they’re really doing is providing something that?looks like?the right answer. Sometimes the similarity to the right answer is striking and you’d be hard pressed to notice a difference. Sometimes…not so much.
Here’s an exercise I’d like you to try:
What you’re going to see is one of three things:
Part of the problem is that for the second and third cases, these LLMs will rarely say “I don’t know”, or “maybe” – they will usually give just as convincing?sounding?an answer as what they provide in the first grade of response. And, for reasons I touch on below, the first type of response?is?just a fluke!
And even on the odd occasion?ChatGPT?says it can’t answer a question, it can often be tricked into doing so, and once again, the result of that tricking will fall into one of the three categories above.
When I was first of companies considering?ChatGPT?use in strata, I asked it “What is the process for getting renovations approved in a strata building in NSW?” – this was, I expected, a straightforward prompt given we’re over 7 years into the new regulatory regime in NSW.
I won’t quote the whole answer on this page (you can find it?here), but my rough assessment at the time was that it was about 10-20% right. Pretty atrocious, frankly. I’d be embarrassed if anyone I managed sent that response to a client.
The next day, after mulling on it further and wondering just where it all went wrong (I sorta suspected, actually), I amended the prompt with a simple tail-end addition (highlighted): “What is the process for getting renovations approved in a strata building in NSW,?with reference to the Act?”. The first paragraph of the?new response?was telling:
领英推荐
“In New South Wales, the process for getting renovations approved in a strata building is regulated by the?Strata Schemes?Management Act1996 (NSW) and its associated regulations.”
Bingo!
No wonder it was up to 90% wrong: It was referencing 27 year old legislation! But when you sign up or login it says its knowledgebase stops late 2021 – how could it be?so?wrong/out of date?!
This is where modelling, prediction and weighting come in. What?ChatGPTdid, based on the information it was trained on, was provide what it calculated/predicted was the most likely/suitable response from weighted information sources. My amendment just got it to admit where that weighting lay –?on information sources tipped towards the 1996 Act.
And the prompt can actually affect the weighted selection of sources, so framing my question differently may well have resulted in the 2015 Act being used, with its “cosmetic”, “minor” and “all other” (aka major) renovation definitions. Maybe. This is why I said above that these systems?may?fluke the right answer.
I tried again by asking for a letter to an owner regarding the major renovation approval process, and, to “force its hand”, my prompt referenced the 2015 Act explicitly, but it still took several refinements to get to the point of being (maybe generously) 75% acceptable (you decide) and ready for me to consider editing and sending it to someone…perhaps. It might have been quicker for me to just type something up myself.
Don’t get me wrong – these models will have continuing improvements (the above examples used?ChatGPT-3.5,?ChatGPT-4?has since been rolled out), and these sorts of “gotchas” will become less and less prominent, even in deeply technical fields such as strata. If I started these tests in five years’ time, the results would be very, very different.
Additionally, it?may?be possible to train the systems?now?by providing a body of knowledge before you ask your “real question”, but you’d have to know the right type of training material to provide, then find or write it in a way the AI can ingest it. At that stage I’d have to ask, “Just what are you really gaining?”
But what I’m really trying to get across is that we’re?nowhere?near the stage where we can rely on these bots in strata (and other deeply niche technical fields),?and?in addition to that, there are several other important issues with using these AIs to keep in mind:
So, after all that, do actually I see any utility here? Sure – but you just need to assess these systems’ suitability for your use cases.
For?generalist?writing tasks, these bots may provide better output than some people can provide. Great, I say go for it for such non-technical use cases.
But if I were running a?strata management?or other technical field company/consultancy, I’d be testing a heap of prompts to see what sort of responses I might expect before going anywhere near a general rollout, and I’d certainly want my staff to be protecting the company by:
And that’s?after?giving all team members prompt generation training (including assessments), determining best use cases (and use cases to avoid), and creating an auditing framework to allow periodic review of usage and suitability.
We are going through a time of rapid awareness and uptake of these technologies, and they’re being applied in new and interesting ways (e.g., meeting summaries in?Microsoft Teams, aiding internet search engines to generate responses, automating computer programming) – but all of these uses come with caveats similar to the above, and we are about to see a wave of partially (or completely) incorrect information flooding our infosphere.
I believe our organisations and management teams have a lot of “checking sources” and “trust assessments” ahead of us externally, and a lot of “use case soul searching” internally.
Despite the length of this post, this is all just the tip of the iceberg, both in the use of these “AIs”, and in the critical assessment of them. I would love to hear your thoughts –?what are you using these AIs for, and where have you seen things gone off the rails?
Please comment below, message me on on?LinkedIn, or?reach out directly?if you’d like to…well…chat. I promise it won’t be a chatbot replying!
And don’t forget to check out my?introduction?and?About page!
[Originally posted on Strata, Meet Data]