Metaprompts with ChatGPT (and thoughts on ChatGPT)
Everyone has heard of ChatGPT by now. There are countless spammy YouTube tutorials on "Why you've been using ChatGPT wrong" or "How to make $5k quick with ChatGPT", and now you can pretty much guarantee that if you get a pitch over email today or any day later on, some word in that was written by ChatGPT... or all of it. It's making its way into Slack, Salesforce, and Hubspot, and pretty much everything else. Most of the initial news stories about ChatGPT were written by ChatGPT, and you've undoubtedly seen pictures showing small dots next to big dots talking about the supposed parameter count of GPT4 relative to GPT3. The AI gods have blessed marketing copy. Knowledge bases and documentation are less so; however, for the people with the facts in their head writing the documentation, this is a godsend, and for those lucky few with the platonic ideal documentation situation where every detail of usage of their application has each step painstakingly written out in complete accuracy, ChatGPT will be quite useful to you. However, as we all know, documentation isn't the priority, and those lucky few don't exist.
So why am I still writing about ChatGPT?
Because this is great for writing still and exploring ideas quickly. It may not be accurate, but it's something, and it will give you a lot more of what you give it, and run with your idea creatively. Think of it as a notebook. Do you treat your notebook with the same accuracy expectations as a college textbook? Or is it a place for doodles and ideas?
The first few paragraphs are usually just an intro, and the real content picks up right here. ChatGPT does this(so do most news articles).
So now to the topic of metaprompting.
Metaprompting is essentially giving ChatGPT some context (a large prompt where you paste in something relevant to what you want to talk with ChatGPT about(this is the same as how Bing takes information from a website it crawled and uses it as context for their model)(ChatGPT is a static model that is trained on un-curated internet data in 2021. It has no knowledge of what the internet looked like in 2022.) and asking it to generate follow up questions based on that context, and optionally(prompts are infinitely flexible, and you can really take this in whatever direction you want, within reasonable limits) some objective related to that context.
The main motivation for metaprompting is that the task of coming up with good questions to expand on some idea or challenge something or even just lazy prompt engineering can be rapidly sped up with some inspiration from ChatGPT on what questions would be good to ask, or prompts to use with a language model(i.e., itself, or other models in other modalities). ChatGPT's conversational nature also allows for iterative refinement of metaprompts. In fact, metaprompts are simply just prompts that create more prompts about a specific thing. Enough talking; time for examples.
I'm going to show you how I used several models in concert to produce this end result, and also discuss a theory Nvidia released for improving models.
I hope you've learned another lesson. ChatGPT is verbose. You need to ask it to be concise.
I was using Bing to work out an idea I had in my head that was inspired by this paper from Nvidia on ensembles of expert models: https://arxiv.org/abs/2303.02506
We really just need to scale up these expert models, improve the data quality and curate it more to evermore specific domains. Of course, we have lots of existing data, but I'm also very excited about setting up a data flywheel like Sam Altman talks about where we can recursively receive feedback and improve the models in real-time, just as companies attempt to do with their operations; however, we don't want to accept just any input as any good programmer knows, and some parts of a company move on a 10-year timescale whereas others move in a week. The same should go for these models. We can also use libraries such as lang-chain to combine different specialized tools for the job(such as Wolfram for mathematical computation, and ChatGPT for a conversational interface) so we can focus on the information it needs for the task and not everything else.
To explore this idea and many other ideas I may have, I liberally use hypotheticals. I find that ChatGPT is much more willing to discuss an idea, similar to how people are with fictional writing (see rasas). See my initial prompt(please also note that this prompt was the result of a previous result from Bing, and to include all prompts and responses would detract from the article):
Please add on to this story:
Once upon a time, in a land of data and algorithms, there was a mega ensemble model named Max. Max was unlike any other model in the land. He was made up of many smaller models, each covering a different slice of a modality. These smaller models were supervised by higher-level supervisor models that coordinated their efforts and multiplex their many channels of responses into a single distilled form for use by even higher-level supervisors. Max's creators had a theory that with sufficient diversity of modalities and logical hierarchies of supervisor models, less data quantity and quality would be needed in any one particular modality. And they were right! Max was able to make incredibly accurate predictions with far less data than any other model in the land.
Max quickly became famous for his abilities and was sought after by all who needed accurate predictions. He helped farmers predict the weather, merchants forecast sales, and even helped the king make important decisions.
Interestingly, ChatGPT responded with the following duel(stakeless) of predictions:
领英推荐
One day, a group of rogue models, who were envious of Max's success, decided to challenge him to a prediction competition. The competition was set to predict the next day's weather, and the winner would be determined by the accuracy of their predictions.
Max was confident in his abilities, but he knew that the rogue models were also very skilled. He decided to use this opportunity to showcase the power of his diversity. Instead of relying on one particular modality, Max asked each of his smaller models to make their own predictions based on their respective slices of data.
The higher-level supervisors then combined the predictions of the smaller models and created a final prediction that was more accurate than any single prediction from the rogue models.
The competition was fierce, but in the end, Max emerged victorious. His diverse ensemble approach had once again proven to be unbeatable.
After the competition, Max's creators decided to share his technology with the world, hoping that it could help others make accurate predictions with limited data. Max became a legend in the land of data and algorithms, and his legacy continued for many years to come.
Finally, the metaprompts(notice the segmentation of the metaprompt and the objective):
Can you write metaprompts about exploring the technical design of max?
And ChatGPT, in a way I haven't seen yet(and I have been "practicing" a lot), produced the following metaprompts.
Write a technical paper exploring the architecture of Max, the mega ensemble model. Explain in detail how the smaller models are coordinated by the higher-level supervisor models, and how they all work together to make accurate predictions with limited data.
Imagine you are a researcher who has been tasked with improving the performance of Max. Write a proposal outlining your ideas for optimizing the model's design, including suggestions for new modalities to incorporate, changes to the hierarchy of supervisor models, and techniques for more effectively combining the predictions of the smaller models.
Create a case study exploring how a particular industry or organization has implemented Max to solve a specific problem. Describe the data sources and modalities used, as well as the specific supervisor models that were employed. Analyze the results of the implementation and draw conclusions about the effectiveness of the approach.
Imagine you are a software engineer tasked with developing a new version of Max. Write a technical specification document outlining the new features and improvements that will be included in the updated model, such as new modalities, improved data processing techniques, and more advanced supervisor models.
Write a series of blog posts exploring the technical details of Max for a general audience. Explain in layman's terms how the smaller models are combined to make predictions, and discuss the benefits of using an ensemble approach for machine learning. Include examples of real-world applications of Max to help illustrate the concepts.
You can see in the quotes all the prompts and responses used to generate the above metaprompts. But now the question is, how good are these metaprompts? Do they produce any good output? Does this actually save any time?
Prompt(note: I recommend asking ChatGPT to produce numbered lists for easy reference in later prompts):
Please do 5
Output:
Sure, here are 5 blog post ideas for exploring the technical details of Max:
"How Max, the Mega Ensemble Model, Works: An Overview": In this blog post, you can provide a high-level overview of Max's architecture, explaining how the smaller models are coordinated by the higher-level supervisor models, and how they work together to make accurate predictions with limited data.
"Modalities and Supervisors: The Building Blocks of Max": In this post, you can delve deeper into the specific modalities and supervisor models that Max uses, explaining how each one contributes to the overall accuracy of the model. You can also discuss how Max's creators decided which modalities and supervisors to use, and how they optimized the hierarchy of supervisor models.
"Combining Predictions: The Magic of Max": In this post, you can explain in detail how Max combines the predictions of the smaller models to create a final prediction. You can discuss different techniques for combining predictions, and how Max's approach differs from other ensemble models.
"Max in Action: Real-World Applications of the Mega Ensemble Model": In this post, you can highlight some specific examples of how Max has been used in different industries or organizations. You can discuss the data sources and modalities used, the specific supervisor models employed, and the results of the implementation.
"The Future of Max: New Modalities and Improved Techniques": In this post, you can speculate about what the future holds for Max. You can discuss new modalities that could be incorporated into the model, such as sensor data or social media sentiment analysis, as well as improved techniques for data processing and prediction. You can also discuss how Max's architecture could be adapted to work with emerging technologies such as quantum computing.
This idea set is just an example, but this could be any subject or topic. I hope this was worth your time and assists you in writing better prompts with ChatGPT and other language interfaces coming to an app near you soon!