Two experiments with ChatGPT in generating algorithms
I have been experimenting with ChatGPT (#chatgpt) for a few weeks now, and I am impressed by its capabilities.
Knowing a bit about how ChatGPT works and about the presence of many opportunities in integrating it with other online systems such as information retrieval systems, it seems almost obvious to me that ChatGPT will keep getting better. So I am excited about its future and many ways we will be able to use it directly or indirectly as part of another system.
Yet judging from my limited set of experiments, I think ChatGPT is still work in progress. One of the biggest issues I see with it is lack of consistency and assurances about the quality of its results. Here by results I mean those whose quality can be judged more objectively; I do not mean results such as poems or essays. Of course, the results quality issue is no secret; it is already acknowledged as a limitation by OpenAI on the ChatGPT opening page.
For now, if you need to use ChatGPT for "real work", it seems like a good idea to be more careful and use ChatGPT for the cases where you will be able to validate the quality of the generated results. Maybe use it as a seed for your own iterations of creating a better solution.
Interestingly this observation on the results quality is also an opportunity for ChatGPT to get better at because if the results are such that they may be validated somewhat objectively, ChatGPT may actually be able to do it on its own, say, with the help of other systems for validation, before presenting the results to the user. This not only will improve the results quality but, if ChatGPT pairs the presentation with the validation information, this will also increase users' confidence in ChatGPT.
Here I want to share my experience with two computer science focused experiments I ran with ChatGPT. Both experiments are about generating an algorithm for a problem.
Getting an algorithm for generating the Fibonacci sequence
The first experiment was about getting "the" fastest algorithm for generating the Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, etc. This sequence consists of integers where every number after the first two, 0 and 1, is the sum of the two preceding numbers.
There are many algorithms to generate this sequence. A presentation of twelve known algorithms is given in [1] with their implementation in the Python programming language in [2]. Each of these algorithms take an integer n as the input and generate the nth Fibonacci number in the sequence.
If there is no bound on how big n can be and only integer arithmetic is allowed, the fastest algorithms for computing the sequence run in time logarithmic in n. Using floating-point arithmetic and allowing potentially approximate answers, there are faster algorithms as discussed in [1].
In this first experiment, I asked ChatGPT to show me the fastest algorithm in Python for generating the Fibonacci sequence. The session is shown below. The answer was returned in a few seconds with a complete program in Python and a good and short explanation.
Unfortunately, the returned program had a simple bug. I pointed out the bug to ChatGPT. I think what ChatGPT did in response was amazing: It acknowledged the bug, corrected it, commented the line with the correction, produced the correct program and even apologized for the bug. Running the corrected program then generated the Fibonacci sequence correctly.
Now imagine what ChatGPT could have done better here: It could have run Python on its own, discovered the bug by itself, and corrected it before presenting it to me. This is an example of what I mean by "integrating with other systems" or "validating its results." It seems like a no-brainer that these kinds of auto-validation will or can be integrated into ChatGPT soon.
Getting an algorithm for computing the minimum cycle mean of a weighted directed graph
This problem is a more technical and/or specialized one. Let us start with the definitions first. If you know what a graph is, these definitions should be easy for you.
I first asked ChatGPT the definition of the minimum cycle mean, the generated text had the correct definition though I had difficulty to figure out the vertices and edges in the example graph shown using ASCII characters.
领英推荐
Next I asked ChatGPT to give me an algorithm to compute the minimum cycle mean of a graph. Here I was unsuccessful to get a correct algorithm: ChatGPT answered using one of the shortest-paths algorithms: Bellman-Ford, Floyd-Warshall, and Johnson. These algorithms can compute single-source or all-pairs shortest paths but not the minimum cycle mean of the input graph.
Also note in the returned result ChatGPT defines the minimum cycle mean incorrectly even though it gave the correct definition when the query was asking for the definition.
I then started a conversation with ChatGPT to see if it could correct its answer. When I pointed out the error in the previous answer, ChatGPT apologized for the mistake, acknowledged that Bellman-Ford and Floyd-Warshall were indeed not the correct algorithms for the problem but incorrectly presented Johnson algorithm as the correct algorithm.
A simple way to see that the returned algorithm is incorrect without running it on your computer is to note that the algorithm immediately returns when it detects negative weight cycles in the input graph. The presence of such cycles in the input graph is a headache for shortest paths problems because shortest paths hitting such cycles will diverge to negative infinity; however, their presence is not a headache for the minimum cycle mean problem because in the presence of such cycles, the minimum cycle mean of the input graph will simply be negative.
I warned ChatGPT again that Johnson algorithm was not the correct algorithm to compute the minimum cycle mean in a graph. ChatGPT again apologized for the mistake, acknowledged the mistake, and presented an unnamed "cycle mean algorithm" that I do not recall having seen before. Once I find some time, I will actually test this algorithm in comparison to other algorithms for the problem [3]. Interestingly, in this new algorithm, there is at least a division operation that tries to compute the average weight of a path.
How could ChatGPT have done better for this question? As presented in [3] and many other references in the technical literature, there are many algorithms to compute the minimum cycle mean in a given graph. ChatGPT could have easily presented one of these algorithms as the correct solution.
Summary
As I mentioned in the introduction, ChatGPT is an impressive system and I strongly believe that it will keep getting better. One immediate opportunity I see is to integrate ChatGPT with other systems so that its results quality can be made more consistent and can potentially be validated. For additional confidence in ChatGPT, it may present its results together with how it validated them.
For now, I love using ChatGPT and I thank #openai for making it available for free. I am looking forward to how it will be used to enhance existing systems and create new ones. It is an exciting time to be alive indeed.
References
[1] A. Dasdan, "Twelve Simple Algorithms to Compute Fibonacci Numbers", https://arxiv.org/abs/1803.07199
[2] A. Dasdan, Code for the algorithms presented in [1], https://github.com/alidasdan/fibonacci-number-algorithms
[3] A. Dasdan, "Experimental analysis of the fastest optimum cycle ratio and mean algorithms", ACM Transactions on Design Automation of Electronic Systems (TODAES), 9(4), 2004.
[4] A. Dasdan, Code for the algorithms presented in [3], https://github.com/alidasdan/optimum-cycle-mean-algorithms
Along the lines of integrating ChatGPT with another system, please see this article on ChatGPT+Wolfram|Alpha by S. Wolfram on handling computational queries far better. There is also a related discussion on Hacker News. https://writings.stephenwolfram.com/2023/01/wolframalpha-as-the-way-to-bring-computational-knowledge-superpowers-to-chatgpt/
Thank you for such an insightful and detailed analysis. I am also very excited about everything related to GPT these days. The complexity involved in accurately providing answers for problems like the ones you mentioned is quite significant. Given that GPT can struggle with relatively simple math problems like solving quadratic equations, it is not surprising that it may also make mistakes in more complex problems that require reasoning. However, there are also cases where GPT is very useful and can be trusted right away. For example, check out this tweet: https://twitter.com/leonovco/status/1599581238687563776 Even now, with the right approach, GPT can provide a competitive edge. Instead of looking for definitive answers, GPT can help "unstuck" us by providing alternative opinions and smoothing the learning curve in domains that require aggregating information. For example, I recently used GPT to help my teenager learn Latin and it was great because all the information we needed was in one place, along with clarifications when needed.
Founder / CEO | Execify
1 年Very interesting Ali Dasdan - learned a lot from this. Eye opening.