The Foundation: Understanding LLMs and Prompt Engineering, And Why It All Matters

The Foundation: Understanding LLMs and Prompt Engineering, And Why It All Matters

Prompt engineering for business applications is complex and requires careful planning and refinement to achieve desired results from AI models. Lee Boonstra, a software engineer @Google with experience in prompt engineering for major businesses, will share practical learnings in a blog series to help others unlock the power of AI beyond simple tasks.

  1. 1) Demystifying Prompt Engineering for the Enterprise
  2. 2) The Foundation: Understanding LLMs and Prompt Engineering, And Why It All Matters
  3. 3) The Art of Prompt Engineering: Mastering Documentation and Effective Prompt Writing
  4. 4) Best Practices for Prompt Engineering in the Enterprise

Let’s get down to basics and talk about how Large Language Models (LLMs) actually work. Think of them like prediction machines. There’s nothing factual; everything is statistical. It generates text, one word after another (well, technically it’s not a word, but it’s a token; multiple tokens may form a word), and then tries to guess what the next word should be. They’re trained on massive amounts of data, so they get pretty good at figuring out how words relate to each other.

When you give an LLM a prompt, you’re basically giving it instructions on how to do this word prediction. Good prompt engineering is all about crafting those instructions really well. It’s like giving someone directions — the more precise and more specific you are, the better the chances they’ll end up where you want them to.

In the world of Generative AI and natural language processing, a prompt is the input you give to the model to get a response. You can use these prompts to make the LLM do all sorts of actions:

  • Summarizing large documents
  • Extracting key information from a speech
  • Answering and reasoning your questions from a contract
  • Classifying stuff (like, is this email spam or not?)
  • Translating languages of a document
  • Generating code or explaining code

That's all for the basics, now let's move on!

The Challenges: It’s Not Always Simple

Anyone can write a prompt, but writing a good prompt? That’s where things get complex. You’ll have to consider a bunch of factors to get the best response from your model:

  1. Teamwork: Get a subject matter expert on board. They know the ins and outs of your topic and can help you evaluate and rate generated answers or provide examples of what “perfect” looks like. Being an engineering, this point often gets overlooked, but frankly it's the most important factor.
  2. Make configurations: You need to pick the right AI model and tweak its settings. Things like how creative (Temperature) it should be, how safe its answers need to be (Safety an filtering Settings), and how it samples which words to use (Top-K / Top-P), it all plays a role.
  3. Prompt Perfection: The way you write your prompt matters — the words you choose, the order you put them in, how you phrase things, how you provide instructions, the role and style that you take, the context you pass in, the examples you give and the constraints and output expectations that you set. It influences the LLM response.

Plus, even with the same prompt, the response can sometimes be different. It’s different from a calculator, where you always get the same answer. So, you can’t just string compare the text of two responses to see if they’re the same.

And let’s not forget the technical challenges.

  • Sometimes, the LLM’s answer is too long and, therefore, breaks up the formatting. (This happens a lot when you work with an output format like JSON)
  • Responses might get blocked if they try to say something harmful, copyrighted, or inappropriate.
  • You can run into quota issues.
  • And all kinds of security challenges!

So yeah, there are many reasons why your output isn’t what you expected. For a consumer using a chat interface like Gemini or ChatGPT, that’s fine; they just type another single line question. For a business application, this can be a severe issue. A food ordering bot that takes your order wrong likely won’t be used a 2nd time. Worse, a medical summary wrongly summarized or a legal contract wrongly explained has serious consequences.

And this is why it’s super important to keep track of your prompts, test them thoroughly, and get feedback from real people like subject matter experts, other prompt engineers in your team or even another automated LLM.

Don’t worry; it’s not all doom and gloom. You can fine-tune your AI model, tweak your prompts, or even try a different model altogether. This guide is all about helping you master the art of prompt engineering, so stick around, and we’ll dive into the nitty-gritty details!

How to Choose and Configure Your LLM for Maximum Impact

Here are some considerations for choosing a model that fits with your use-case:

  • Small vs. Large Model: The model’s size can significantly impact its performance and the quality of its responses. Smaller models may be faster and more cost-effective, but they lack the complexity and nuance of larger models. For instance, a smaller model could be sufficient for basic text classification tasks, while a larger model might be necessary for complex question answering or creative text generation.
  • Industry-Specific Models: In some cases, specialized models trained on domain-specific data can offer superior performance. For example, Med-PaLM and Sec-PaLM are tailored for medical and cybersecurity applications. If your use case falls within a specific industry, it’s worth considering whether a specialized model could provide more accurate and relevant results.
  • Open-Source vs. Commercial The decision between open-source models (e.g. Gemma, LLaMA) and those commercial ones (e.g. models on Vertex AI, GPT, Claude...) should be based on factors such as customization needs, model size, architecture, access to computational resources, library usage and cost. Open-source models offer flexibility and potential cost savings but may (or may not) require more setup and fine-tuning. On the other hand, commercial ones, like Gemini on Vertex AI provides a managed environment with pre-trained models and seamless integration with other (Google) Cloud services.
  • Context Window Size & Output Token Limit: The context window refers to the maximum amount of text the model can use when generating a response. The output token limit determines the response length the model can produce. The limits on these parameters are essential to consider, especially when working with long documents or complex prompts. For instance, if you need to summarize a lengthy legal contract, you’ll need a model with a large enough context window to process the entire document. When you choose JSON as an output format, the JSON format itself might eat up half of your output tokens, so the output token limit is equally important.

Fine-Tuning Your LLM: It’s Not Just About the Model

Once you pick the correct model for your use case, you must tinker with the various configurations of an LLM, such as the output length and sampling controls, such as temperature or Top-K/Top-P. Most LLMs come with multiple configuration options that control the LLM’s output. Effective, prompt engineering requires setting these configurations optimally for your task.

Output Token Length

One of the key settings is the output token length. This controls how many tokens (roughly words) your LLM spits out in its response. Now, here’s the thing: more tokens mean more computing power, which translates to higher costs and potentially slower response times. And guess what? Making the output shorter doesn’t magically make your LLM more concise. It just causes the LLM to stop predicting more tokens once the limit is reached.

TIP: If you’re dealing with JSON output, be extra careful with the token limit. The JSON formatting itself can eat up a lot of tokens, so you don’t want to end up with a broken response, which makes the JSON invalid (and therefore, you can’t chain API calls).

Sampling Controls: Let’s Get Creative (or Not)

LLMs don’t just predict one word at a time. They actually calculate probabilities for all the words in their vocabulary and then sample from those probabilities to choose the next word. This is where things like temperature, Top-K, and Top-P come in. They control how random and creative (or not) your LLM gets.

Temperature

Temperature controls the degree of randomness in token selection. Higher temperature means more random and unexpected results, while lower temperature makes your LLM stick closer to the expected output. Think of it like this: crank up the temperature if you want your LLM to write a wild marketing blog post. But if you need it to extract medical info from a patient report, keep it low and factual.

NOTE: Don’t go overboard with the temperature. Above 1, things start to get weird and nonsensical. As the temperature increases, all tokens become equally likely to be the next predicted token.

Top-K and Top-P

Top-K and Top-P (also known as nucleus sampling) are two sampling settings used in LLMs to restrict the predicted next token from tokens with the top predicted probabilities. Like temperature, these sampling settings control the randomness and diversity of generated text.

Top-K picks the top (K) most likely words, while Top-P picks the words whose combined probability doesn’t exceed a certain value (P).

The best way to choose between Top-K and Top-P is to experiment with both methods (or both together) and see which one produces the results you are looking for. A low temperature (e.g., 0.1, works best with a high Top-P: 0.95)

Safety Settings

Many large language models have safety settings or content-filtering controls. For instance, Gemini comes equipped with safety settings designed to filter model output, preventing the generation of harmful, unsafe, biased, or unfair content. These settings can be configured to align with your specific requirements and risk tolerance. They can be turned off so no filtering is applied, they can be set to moderate to mostly remove unsafe content, but potentially harmful content might still be present, or strict, which filters rigorously to minimize the risk of unsafe content.

NOTE: Safety settings depend on the model, are not foolproof, and might not catch all instances of unsafe content. Human oversight and additional safeguards are still necessary.

Coming up next in our series, we’re diving into a topic that’s often overlooked but very important: documenting your prompts. I know, it might not sound as exciting as playing around with LLMs, but trust me, it’s a total game-changer. So stay tuned for our next post where I’ll spill all the details to save yourself from future headaches!

#promptengineering #gemini #llm #generativeai #gpt

要查看或添加评论,请登录

Lee Boonstra的更多文章

社区洞察

其他会员也浏览了