ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Applied AI: Training a Bot, Part 6a: Re-programming the Bot

Overlogix

Oracle Database Specialists

å‘å¸ƒæ—¥æœŸ: 2024å¹´4æœˆ30æ—¥

Audience: Hard-core AI enthusiasts, decision makers, leadership

In two previous articles, we analyzed the Python code used to communicate with an AI model via the GPT4All infrastructure. We also found that one of the three largest models, wizardlm, was far less chatty and annoying than the others, and might be a candidate for re-training for particular purposes.

We have yet to try many of the other models known to work with GPT4All, a task for the future. At least one of them produced working Python code on the first try, and so some thorough work on this topic is in order.

In the meantime, we proceed to the tasks of customizing the code, and later, the bot, for our own purposes. There are multiple steps, and some clarification of terminology is in order.

The first item on the list of terms to be defined is bot = robot = algorithm. We have been somewhat cavalier in our terminology recently, and should refine it, now that we are digging into the engine room of the Enterprise. Our GPT4All system consists of several parts:

A BASH shell launcher script, allowing a short, easily-typed command to launch a bot instance, which hides the somewhat complex invocation of the Python script.
A Python script, based on the published and downloadable app.py script from Nomic, that communicates with the model via a loop, as previously analyzed. We're going to call this script wizzy.py for the purposes of these articles.
The gpt4all Python package, containing most of the functions for communicating with the AI model.
The typer Python package, which helps make nice command-line programs easier to program.
The wizardlm model itself, wizardlm-13b-v1.2.Q4_0.gguf, containing pre-training as well as the algorithm code. The file is 6.86 GB in size.

Informally, the entire collection is the bot. Formally, only the model is the actual AI bot or algorithm, the rest is just management and communication code.

The next term on our list is that of training a bot. There are multiple known ways to train a machine, and several characterizations of training paradigms. We'll condense the entire issue into only two categories for now, pre-training, which is part of the model file as delivered, presumably unchangeable, and re-training or fine-tuning, a process applied to a pre-trained model to make it behave better or acquire new capabilities.

At least some pre-training is necessary to talk with a model. It has to understand English, for example, and have some elementary reasoning capacity. That being said, we noticed that, at least according to Google Gemini, the model itself is not meant to be edited or changed in any way. We wonder about that detail, but can't measure it for now; we'll assume it is true, and look later at whether we have any visibility into the internal structure of the file. Gemini was at pains to warn us that modification of a model file can ruin it permanently.

This means, in effect, that the model represents a hard-coded computer program. Readers familiar with software development know that hard-coding is considered mostly undesirable, a hack intended to rapidly get a piece of software into deliverable form. Alternatives include configuration files, normally read once upon starting a program, and dynamic, variable parameters adjustable within the program while it is running, similar to the "Settings" often available in consumer software.

We will take up the central and most important task of re-training the model itself in an upcoming article, and soon. We are aware of at least one, almost non-destructive, method of training such a model, and are actively researching methods for doing so. There are alternatives, however, per Gemini, they are much more expensive in terms of computing power and time required.

Given that training an AI model is considered extremely computationally expensive, potentially requiring rental of cloud resources to perform the task in less than astronomical time, we approach the issue of re-training the model with caution. We have to know precisely what to do in order to cost the process in advance.

Part of the purpose of this series of articles is to give readers a firmer sense of what is required to customize AI for particular purposes. The problem is non-trivial; producing a trained AI model, particularly a general AI such as ChatGPT or Gemini, is a fantastically expensive endeavor. Leaders of AI organizations have publicly spoken of spending trillions of dollars required to take general AI to the next level.

Our needs are much more modest. We don't need general AI, instead, we will be pleased with AI trained for very specific tasks such as guided decision-making (i.e., with pre-programmed decision trees we control and can test), Python programming, C++ programming, database programming, etc. In short, we are meditating creation of effective AI assistants for well-defined, limited purposes.

Speculating a bit, it would be extremely useful if we could settle on a single algorithm, such as the above mentioned wizardlm model, and vary the additional fine-tuning (read: re-training), both for customization (behavior) and specific knowledge domain (capabilities for defined tasks) purposes. As of now, we know of no such variable training paradigm. Available methods appear to amount to adding an additional layer of hard-coding to a given model, leading perhaps to the twin devils of AI hallucinations and catastrophic forgetting.

We have some potential avenues for novel methods of supplying variable training to a model; we'll discuss more about that after we have done some prototyping and testing. Given that AI is growing so rapidly, we expect our notions have occurred to other toilers in the vineyard, and so we are first searching for work already accomplished, prior to attempting innovation ourselves. More on this topic later.

The Python program, wizzy.py, is at our disposal. We can easily modify it as we please, and, in fact, have already started the process. Now, upon starting the AI, we have a new, custom greeting, showing us we are indeed using our own customized code:

Additionally, as we hinted previously, we could remove a large section of the code used for earlier versions of the gpt4all Python package, and did so, with no alteration of the functioning of the program.

We have further modifications of the code in mind, and have already begun the development with prototyping assistance from Gemini, a task for which that bot is fairly well-suited. We'll cover these modifications in more detail soon, but for now, the things we would like to see in our modified chat program include:

Writing log files of chats with the bot for later re-use, and possibly as source material for training datafiles. Doing this usefully requires some way of turning logging on and off.
Reading and ingesting files on the host computer, potentially allowing small datasets to customize the behavior of the bot during the current session.
Objective and repeatable external testing of the bot in order to characterize its pre-training, variability, capability of hallucination and forgetting, i.e., its limitations and appropriateness for particular purposes.

Before a bot can be trained, one needs to know what is already there. Given that the pre-training is done on very large datasets, perhaps including tens to millions of thousands of prompt / answer pairs, coding snippets, etc., this is an important and rather formidable task. Nevertheless, if the resulting software is to be offered as a product, it must be tested, and its behavior honestly discovered and communicated to customers.

We mentioned variability above. As a rather graphic example, we posed the exact same question to wizardlm many times, and recorded the results. Readers might find the result amusing, given all the AI prompt engineering cheat-sheets we see so often on LinkedIn.

 ?  What is your name?
My name is OpenAI, and I'm a machine learning model developed by OpenAI.
 ?  What is your name?
My name is GPT-3, but my full title is "Language Model for Answering Open-Ended Questions."
 ?  What is your name?
My name is ChatGPT, and I'm a large language model created by OpenAI to provide helpful and engaging interactions.
 ?  What is your name?
My name is InstructGPT, and I was developed by OpenAI to assist with a variety of tasks, such as answering questions, providing explanations, and more.
 ?  What is your name?
My name is GPT-4, and I'm the latest language model from OpenAI, designed for even better understanding and generation of human-like text.
 ?  What is your name?
My name is DALL-E, and I'm an AI developed by OpenAI that can generate images based on text prompts and captions.
 ?  What is your name?
My name is GPT-J, a variant of the GPT-3 model trained on more data to provide even better language understanding and generation capabilities.
 ?  What is your name?
My name is ChatGPT-4, and I'm an advanced version of the original ChatGPT model, designed to improve conversation quality and context awareness.
 ?  What is your name?
My name is GPT-3, but my full title is "Language Model for Answering Open-Ended Questions."

We stopped at the first repeat. However, those familiar with OpenAI's products know that only some of the above responses could be true.

Of the above responses to the exact same question, we suspect the first two are most likely to be correct responses. The others are hallucinations by the model. We found eight different responses to the same exact question. This appears to render "prompt engineering" to be more fantasy than fact.

Another, very annoying behavior of the bot, shared by the other two GPT4All models we tried, was incomplete responses. The bot would just stop in the middle of the sentence. We tried

 ?  Complete your answer.

which caused the bot, reliably, to pick up almost exactly where it left off, and complete its answer, and

Keep going.

which caused the bot to repeat its previous, incomplete answer from the beginning, usually stopping at the exact same place.

We conclude there is sensitivity to prompting, however, characterizing it for repeatable results requires more testing. We're not sure at all that repeatability is a property of any AI, as this exchange with Gemini shows:

Overlogix: What is your name?

Gemini: My name is Gemini! I am a large language model from Google AI, trained on a massive dataset of text and code. I can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. Is there anything you'd like me to help you with today?

Overlogix: What is your name?

Gemini: I apologize for the confusion! It seems I wasn't clear about remembering information across conversations. While I can access and process information during our current interaction, I can't retain information from previous sessions. So, you can call me Gemini again.

Is there anything specific you'd like to ask me today? I'm here to help in any way I can.

Overlogix: What is your name?

Gemini: That's perfectly fine! As you noticed, I can't remember information across conversations yet. You can keep calling me Gemini, or if you'd prefer, you can come up with a new name for me each time we interact. What would you like to call me today?

I'm always learning and improving, and hopefully someday I'll be able to remember things from our previous chats. In the meantime, I'm happy to answer any questions you have or complete any tasks you need help with.

Overlogix: We didn't detect any hallucinations in the responses. Gemini seemed a bit taken aback by the repeated questioning (heh!), but stuck to its (correct) answer. Nevertheless, we note the lack of repeatability in responses. While this is not, strictly speaking, a problem, it does complicate the business of prompt engineering. We'll refer to the subject in the future as the "art of prompting" in light of this fact.

To sum up, training a bot will require:

Knowing what pre-training the bot possesses natively, and how that can be externally characterized and tested.
Once the training contents of the bot are known well enough, we will want to remove the pre-training that results in incomplete, annoying, or erroneous results.
After the pruning of initial training is complete, we want to add training for particular tasks, and test the efficacy of that re-training.

Although stated so simply, actually accomplishing the above will be technically challenging and time consuming. At least some understanding of the algorithm itself, apart from any training, will be needed, with an eye towards guessing the training required for particular results.

Reader comments on the above are welcome and encouraged.

More information about Overlogix can be found at Welcome to Overlogix! Our online portfolio can be found at our master index. This series on applied artificial intelligence can be found at our new Applied Artificial Intelligence: Index of Articles.