Game Over for BAs? Part 5: Hey ChatGPT, I need assistance!
Up to now we have been discussing the impact of AI on the field of Business Analysis using ChatGPT as an example. We have started by getting an AI to generate acceptance criteria and UML diagrams for a requirement with only basic information provided (Game Over for BAs?) and then have discussed about what it means to the BA profession and how it is impacted (Game Over for BAs? - Part 2). We actually went into generate whole article by ChatGPT itself (Game Over for BAs? - Part 3 ) and then we came to one of the most important experiments, where I asked ChatGPT to act as a BA and gather requirements from me (Game Over for BAs? - Part 4: A conversation with ChatGPT). In that last article we discussed if AIs could evolve to a point where it may have logical reasoning and deduction capabilities and derive answers from information we provide. Obviously, my next step was to go and try this out :)
The reason to explore this capability with regards to the BA function was very simple; as BAs we produce large amounts of information that relates to the system functionality. These can come in the way of stories, epics, BRS documents, UML diagrams, user manuals etc. Once we start delivering the software we also get a lot of questions from customers. These can range from "how do I do this?" to "given this scenario, what is the system going to do?". In this light, can an AI based system go through all our documentation and start answering the questions from the customer?
A prelude
For this experiment, I took a comprehensive documentation that I knew to be accurate and was in an area that I was comfortable in asking questions in different ways; Colombo Stock Exchange's trading rules (Trading participant rules applicable to stockbrokers and stock dealers). This is a public and a regulatory document that is very prescriptive about the functionality and rules. Usually, AIs fare better on this type of information.
Before we go through with what happened with it, let's try to understand how someone would answer questions given data or information. Over the course of me trying to meddle with AI, I have come to categorise the ways I can ask questions. Let's take an example:
Given this information, we can ask a direct question: "How many wheels does a bus have?". The question is very prescriptive and directly relates to the information we have provided. This is similar to giving someone a list of names and phone numbers and asking "what is X's phone number?". There is no specific derivation required to answer these type of questions.
We can also ask a direct but inverse question: "How many vehicle types have 4 wheels?" Note that in this, one has to go through each of the vehicle types and try and answer a direct question for each and then collate all the answers. In quite a few of our day-to-day scenarios these questions turn up. "How may brands of biscuits are there?", "Which books talk about philosophy?", "Tell me a few names of romantic movies" etc. They all refer to a particular type of a categorisation where the relationship is n:1 and we are asking you to find the "n" answers given the "1" we are interested in. These inverse questions can also cover summarisation questions as they require you to cluster the data by a small number of categories.
There is also an indirect/derivative question type that we can ask; "If we have 8 people to travel, and we don't want to share the ride with anyone else, what is the most economical method to travel?" To answer this question, we need to look into multiple factors, the number of vehicles needed to accommodate, and the cost per head based on the Total Cost per KM. You need to read through the data to extract the combination of vehicles that gives the best cost set up (do note that we are taking off factors such as comfort factor and some other sociability aspects).
(Of course, this categorisation is purely something I made up and there are grey areas in this. For example, if you ask a question like "What is that movie which was about a bunch of guys stealing something? A heist type of a movie? Brad Pitt was in it?" Then are we talking about type 2 or type 3? :) )
In my opinion, if we want an AI to be good at answering questions from a customer, then it needs to be good at this second and third type of questions. Therefore, the AI should be good at identifying and building relationships between different entities based on what we tell. This problem is even more complicated as in real-life, the data will not be presented in an orderly manner. Imagine the above example being explained as follows:
"A motorcycle has two wheels while cars and vans have four wheels. On the other hand, buses and lorries have six wheels each. When it comes to transporting people, we would use motorcycles, cars, buses and vans. If you need to transport goods, you can use lorries or vans...."
The AI then needs to identify what are the key categories (wheels? vehicle types? people? goods?) and build up ordered data based on this description.
The experiment
To start off with my experiment, I gave ChatGPT-4o explicit instructions to read the document and answer questions explicitly based on that document and nothing else. I wanted to ensure that it wasn't using any pre-learned information that is available off the web to answer my questions.
Direct questions
I asked a few questions where the document was explicitly talking about those areas, and it responded as follows:
The description from ChatGPT was accurate and was descriptive. I tried this on one more item and it did respond in an accurate manner. I think we can safely conclude that ChatGPT can in fact answer direct questions very well.
Inverse questions
When it comes to "direct but inverse" question types I had defined, again ChatGPT responded in somewhat satisfactory manner. However, sometimes it needed prompting more than once to get the complete answer. As this is particularly interesting (i.e. the AI going through the document and trying to cluster information), let's analyse this a bit:
If you take the document published by CSE, there is no one place where all the market protection mechanisms are mentioned. The reason for this is that this is a rule book as opposed to a marketing document where someone might be interested in knowing what all such features are. In a rule book, it is more important to know the protections available against each feature. In this particular case, CSE lists the protection mechanisms available for Market Orders when explaining that feature, it explains constraints on Crossings when explaining that feature and so on.
Now if we look at the first answer ChatGPT compiled, it has gone through each of the sections and compiled it as we expected. However, as I was familiar with the particular field, I knew that there is some missing information. Hence, I prompted it to "tell me more".
The final answer it gave which is on the right side of the image above is comprehensive and covers all the scenarios. However, it seems to have answered more than what was required and provided non-relevant information as well. To the ones who are not familiar with the field, the question I asked was "Does this system have a control to protect the market from trades executing at drastically different prices?", the second question also asks the same question but doesn't include the some of the words. In the answer it gave, if you check the lower right corner, you can see there's a section on Order Validation Checks. However, if you look at the description you will see that it talks about trading lot sizes and security codes as well. These do not have anything to do with price controls.
When answering questions like this, an AI system needs to be very concise and accurate for it to be useful. Being not concise will push away most users as they now have to sift through the information to find what is relevant for them.
领英推荐
Indirect/derivative questions
Indirect and derivative questions are important as in real-life, most of the questions that would come towards a person who is supporting a client would be in this nature. Therefore, I decided to shoot a standard "I had done this, system was this, what is going on?" type of a question.
In this particular case, ChatGPT answered correctly and spectacularly well. If you however look at the reasoning i.e. the quotation that it had referred, we can see that it has nothing to do with this scenario. So why do I say that it did spectacularly well?
There is no direct description of the stop order election behaviour on what rules will be evaluated when electing a stop order based on the last traded price. Instead, the information comes from the very detailed scenarios that CSE has included in the document. It seems (or we hope) that ChatGPT went through all the scenarios and identified the common theme among them to answer our question. It is also important to note that it could even give a creative answer on "at which price should my order get elected?" question by actually coming up with a number.
It also managed to reason out a scenario that deals with a sequence of events:
In the system we are talking about, the regular trading hours start "after" the open auction is done. The pre-open session occurs before the open auction. While the document has nowhere explicitly stated the sessions where OPG orders can be entered, it does provide the principle of operation. ChatGPT seems to have properly reasoned out that given the sequence of which the sessions are ordered and by applying the principle of operation, the answer to the question should be "no".
Having said that, the AI failed in more complex scenarios. In a particular case I was trying to get the right answer out of it by repeating the question in different ways and pointing out some specific information, but the answers weren't getting any better. There is a lot of room to improve here.
The knowledge graph
When I was talking about how AI works and how it can impact day-to-day life in a major manner with my friends, we always seem to gravitate towards this idea of knowledge graph building. At least the way we understood, that is how we as humans build and infer knowledge.
Towards the end of the experiment, I wanted to see whether ChatGPT has a concept of a knowledge graph and if so, what does it look like. Unfortunately, I realised that I didn't ask it to generate it as the first thing after reading the document I submitted. Purely for the sake of comparison, I ran another experiment where I asked it to generate a graph right after reading the document. And here's how they looked:
We can see here that it has identified some major keywords and categories and the relationship among some of them. We can also see that there are some "floating" nodes where the connections have not been identified yet.
What is interesting is that what it generated after the original conversation ended:
We can see here now that there are more interconnections between nodes and new nodes. There seems to be a new topic of "Auction Participation" that is identified and it has identified the connection between that and the order types. It has also identified a new topic called "Circuit Breakers" as I was asking some questions about that. There is also a new "Trading Sessions" topic that has turned up.
As always, the experiment link is here: https://chatgpt.com/share/fc1e7ee3-db8d-4d53-bc1d-8b2f161dfabd
Conclusion
Running through this experiment has been an absolute delight. While it does show that an AI is very good at answering direct questions it also showed that it wasn't that bad when it comes to indirect questions and deducing answers based on available information.
While there are issues with some of the answers given, I think the last part of the experiment shows where systems like ChatGPT can go. While it has not been specifically trained for the domain it was asked questions on (financial services and systems), it has shown evidence of its capability to "learn" by learning interconnections. In my opinion, if we train a system like this with a group of experts, it can definitely get to a much better level of answering questions where it has to derive answers through more complex methodologies.
The challenge as always would be on whether we can get consistent answers out of an AI system. Perhaps the way to resolve that is to do how we do it, devise evaluations on its answers which continuously feed into its performance ;)
So what do you think? Where do you think we are going?