The Foundation: Understanding LLMs and Prompt Engineering, And Why It All Matters
Lee Boonstra
Software Engineer Tech Lead, Google, Office of the CTO | Keynote Speaker | Published Author | AI Strategist | Innovator |
Prompt engineering for business applications is complex and requires careful planning and refinement to achieve desired results from AI models. Lee Boonstra, a software engineer @Google with experience in prompt engineering for major businesses, will share practical learnings in a blog series to help others unlock the power of AI beyond simple tasks.
Let’s get down to basics and talk about how Large Language Models (LLMs) actually work. Think of them like prediction machines. There’s nothing factual; everything is statistical. It generates text, one word after another (well, technically it’s not a word, but it’s a token; multiple tokens may form a word), and then tries to guess what the next word should be. They’re trained on massive amounts of data, so they get pretty good at figuring out how words relate to each other.
When you give an LLM a prompt, you’re basically giving it instructions on how to do this word prediction. Good prompt engineering is all about crafting those instructions really well. It’s like giving someone directions — the more precise and more specific you are, the better the chances they’ll end up where you want them to.
In the world of Generative AI and natural language processing, a prompt is the input you give to the model to get a response. You can use these prompts to make the LLM do all sorts of actions:
That's all for the basics, now let's move on!
The Challenges: It’s Not Always Simple
Anyone can write a prompt, but writing a good prompt? That’s where things get complex. You’ll have to consider a bunch of factors to get the best response from your model:
Plus, even with the same prompt, the response can sometimes be different. It’s different from a calculator, where you always get the same answer. So, you can’t just string compare the text of two responses to see if they’re the same.
And let’s not forget the technical challenges.
So yeah, there are many reasons why your output isn’t what you expected. For a consumer using a chat interface like Gemini or ChatGPT, that’s fine; they just type another single line question. For a business application, this can be a severe issue. A food ordering bot that takes your order wrong likely won’t be used a 2nd time. Worse, a medical summary wrongly summarized or a legal contract wrongly explained has serious consequences.
And this is why it’s super important to keep track of your prompts, test them thoroughly, and get feedback from real people like subject matter experts, other prompt engineers in your team or even another automated LLM.
Don’t worry; it’s not all doom and gloom. You can fine-tune your AI model, tweak your prompts, or even try a different model altogether. This guide is all about helping you master the art of prompt engineering, so stick around, and we’ll dive into the nitty-gritty details!
How to Choose and Configure Your LLM for Maximum Impact
Here are some considerations for choosing a model that fits with your use-case:
领英推荐
Fine-Tuning Your LLM: It’s Not Just About the Model
Once you pick the correct model for your use case, you must tinker with the various configurations of an LLM, such as the output length and sampling controls, such as temperature or Top-K/Top-P. Most LLMs come with multiple configuration options that control the LLM’s output. Effective, prompt engineering requires setting these configurations optimally for your task.
Output Token Length
One of the key settings is the output token length. This controls how many tokens (roughly words) your LLM spits out in its response. Now, here’s the thing: more tokens mean more computing power, which translates to higher costs and potentially slower response times. And guess what? Making the output shorter doesn’t magically make your LLM more concise. It just causes the LLM to stop predicting more tokens once the limit is reached.
TIP: If you’re dealing with JSON output, be extra careful with the token limit. The JSON formatting itself can eat up a lot of tokens, so you don’t want to end up with a broken response, which makes the JSON invalid (and therefore, you can’t chain API calls).
Sampling Controls: Let’s Get Creative (or Not)
LLMs don’t just predict one word at a time. They actually calculate probabilities for all the words in their vocabulary and then sample from those probabilities to choose the next word. This is where things like temperature, Top-K, and Top-P come in. They control how random and creative (or not) your LLM gets.
Temperature
Temperature controls the degree of randomness in token selection. Higher temperature means more random and unexpected results, while lower temperature makes your LLM stick closer to the expected output. Think of it like this: crank up the temperature if you want your LLM to write a wild marketing blog post. But if you need it to extract medical info from a patient report, keep it low and factual.
NOTE: Don’t go overboard with the temperature. Above 1, things start to get weird and nonsensical. As the temperature increases, all tokens become equally likely to be the next predicted token.
Top-K and Top-P
Top-K and Top-P (also known as nucleus sampling) are two sampling settings used in LLMs to restrict the predicted next token from tokens with the top predicted probabilities. Like temperature, these sampling settings control the randomness and diversity of generated text.
Top-K picks the top (K) most likely words, while Top-P picks the words whose combined probability doesn’t exceed a certain value (P).
The best way to choose between Top-K and Top-P is to experiment with both methods (or both together) and see which one produces the results you are looking for. A low temperature (e.g., 0.1, works best with a high Top-P: 0.95)
Safety Settings
Many large language models have safety settings or content-filtering controls. For instance, Gemini comes equipped with safety settings designed to filter model output, preventing the generation of harmful, unsafe, biased, or unfair content. These settings can be configured to align with your specific requirements and risk tolerance. They can be turned off so no filtering is applied, they can be set to moderate to mostly remove unsafe content, but potentially harmful content might still be present, or strict, which filters rigorously to minimize the risk of unsafe content.
NOTE: Safety settings depend on the model, are not foolproof, and might not catch all instances of unsafe content. Human oversight and additional safeguards are still necessary.
Coming up next in our series, we’re diving into a topic that’s often overlooked but very important: documenting your prompts. I know, it might not sound as exciting as playing around with LLMs, but trust me, it’s a total game-changer. So stay tuned for our next post where I’ll spill all the details to save yourself from future headaches!
#promptengineering #gemini #llm #generativeai #gpt