07 - Controlling LLM Output

Techniques for fine-tuning LLM responses

This chapter focuses on techniques to fine-tune the responses generated by LLMs, allowing for more precise and tailored outputs.

7.1 Temperature and top-k sampling

Temperature and top-k sampling are parameters that control the randomness and diversity of the LLM's outputs.

Key points:

Temperature: A higher value (e.g., 0.8) increases randomness, while a lower value (e.g., 0.2) makes outputs more deterministic.
Top-k sampling: Limits the selection of next words to the k most likely options.

Best practices:

Use lower temperature for factual or precise tasks
Use higher temperature for creative or diverse outputs
Experiment with different values to find the right balance

Example prompt:

Instructions: Generate three unique marketing slogans for a new eco-friendly water bottle. Be creative and catchy.

Parameters:
- Temperature: 0.7
- Top-k: 50

Slogans:
1.
2.
3.

7.2 Repetition penalties

Repetition penalties help prevent the LLM from repeating the same phrases or ideas too frequently.

Best practices:

Use higher penalties for tasks requiring diverse language
Adjust penalties based on the desired level of repetition
Be cautious not to set penalties too high, as it may affect coherence

Example prompt:

Task: Write a short paragraph describing a sunset over the ocean. Use vivid language and avoid repeating descriptive words.

Parameters:
- Repetition penalty: 1.2

Paragraph:

7.3 Output formatting instructions

Clear instructions on output format can help structure the LLM's responses in a desired way.

Best practices:

Specify the exact format you want (e.g., bullet points, numbered list, JSON)
Provide examples of the desired format
Use delimiters to clearly mark different sections of the output

Example prompt:

Generate a recipe for a vegetarian lasagna. Format the output as follows:

Name: [Recipe Name]

Ingredients:
- [Ingredient 1]
- [Ingredient 2]
...

Instructions:
1. [Step 1]
2. [Step 2]
...

Cooking Time: [Time in minutes]
Servings: [Number of servings]

Please ensure all sections are included and properly formatted.

7.4 Hands-on exercise: Fine-tuning LLM responses

Now, let's practice controlling LLM outputs:

Create a prompt that generates a short story, experimenting with different temperature settings to observe how it affects creativity and coherence.
Design a prompt for a factual Q&A task, using a low temperature and appropriate repetition penalty to ensure accurate and concise responses.
Develop a prompt that outputs a structured dataset (e.g., a list of books with titles, authors, and publication years) in a specific format like JSON or CSV.
Craft a prompt for generating diverse solutions to a problem, using a combination of high temperature and top-k sampling to encourage varied outputs.

Example solution for #3:

Task: Generate a list of 5 classic science fiction novels, including their titles, authors, and publication years. Format the output as a JSON array.

Parameters:
- Temperature: 0.3
- Repetition penalty: 1.1

Output format:
[
  {
    "title": "Book Title",
    "author": "Author Name",
    "year": YYYY
  },
  ...
]

Please ensure the data is accurately formatted as JSON and includes 5 unique entries.

By mastering these techniques for controlling LLM output, you can fine-tune the responses to better suit your specific needs, whether you're looking for creative variety, factual precision, or structured data outputs.

In the next chapter, we'll explore prompt optimization techniques, including methods for iterative refinement and A/B testing of prompts.

07 - Controlling LLM Output

7.1 Temperature and top-k sampling

7.2 Repetition penalties

7.3 Output formatting instructions

7.4 Hands-on exercise: Fine-tuning LLM responses

On this page