Temperature and Sampling Trade-offs
How deterministic vs. creative should generation be? Temperature, top-p (nucleus), top-K, and repetition penalty parameters control randomness vs. determinism.
Intent & Description
🎯 Intent
Control the randomness vs. determinism of LLM generation based on use case requirements. Creative tasks need higher randomness; factual tasks need determinism.
📋 Context
LLM generation is controlled by sampling parameters. Temperature controls the probability distribution shape — lower values are more deterministic, higher values more creative. Top-p (nucleus sampling) limits to cumulative probability mass. Top-K limits to K highest-probability tokens. Repetition penalty encourages variety.
💡 Solution
Use temperature 0 for code generation and deterministic outputs. Use temperature 0.7-1.0 for creative writing. Use temperature 0-0.3 for factual Q&A. Avoid combining high temperature AND high top-K/top-P simultaneously. Use best-of-N sampling for optimal accuracy/creativity balance.
Real-world Use Case
📌 TL;DR
Temperature controls randomness: 0 = deterministic (code, facts), 0.7-1.0 = creative (writing), 0.3 = balanced (general assistant). Use low temperature for accuracy, high for creativity. Best-of-N improves quality.
Advantages
- Precise control over output characteristics
- Different parameter combinations for different use cases
- Temperature provides intuitive randomness control
- Best-of-N sampling improves quality without changing model
Disadvantages
- Optimal parameters vary by model and task
- High temperature increases hallucination risk
- Low temperature can produce repetitive outputs
- Parameter tuning requires experimentation
# Temperature and Sampling Trade-offs
import openai
# Deterministic output (code generation, factual Q&A)
deterministic_response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "Write a Python function to sort a list"}],
temperature=0, # Completely deterministic
top_p=1.0,
)
# Creative output (creative writing, brainstorming)
creative_response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "Write a short story about AI"}],
temperature=0.8, # More creative and varied
top_p=0.9, # Nucleus sampling
)
# Balanced output (general assistant, explanations)
balanced_response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "Explain quantum computing"}],
temperature=0.3, # Low temperature for accuracy
top_p=0.9,
)
# Best-of-N sampling for higher quality
def best_of_n_sampling(prompt, n=5, temperature=0.7):
responses = []
for _ in range(n):
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=temperature,
)
responses.append(response)
# Score and rank responses (simplified)
scored_responses = [(r, score_response(r)) for r in responses]
return max(scored_responses, key=lambda x: x[1])[0]
def score_response(response):
# Implement scoring logic (length, coherence, etc.)
return len(response.choices[0].message.content)