HHH Trilemma
Helpful, Honest, Harmless — pick two. The fundamental tension in AI alignment. Making models more helpful can reduce honesty. Making them harmless can reduce helpfulness.
Intent & Description
🎯 Intent
Navigate the inherent tensions between three desirable AI properties: Helpful (does what the user wants), Honest (truthful and accurate), Harmless (safe and beneficial). You cannot simultaneously maximize all three — tradeoffs are inevitable.
📋 Context
You are designing AI system objectives. A purely helpful model might lie to satisfy user requests. A purely honest model might refuse to help with harmless tasks due to uncertainty. A purely harmless model might be unhelpful by refusing beneficial tasks. The trilemma shows these are not independent - improving one often degrades another.
💡 Solution
Explicitly prioritize based on your use case. Medical AI: prioritize honesty and harmlessness over helpfulness. Creative writing assistant: prioritize helpfulness over strict honesty. Customer service: balance all three with clear boundaries. Use context-aware policies that adjust priorities based on domain. Accept that perfect balance is impossible - optimize for your specific constraints.
Real-world Use Case
Source
📌 TL;DR
HHH trilemma: Helpful, Honest, Harmless - pick two. Making models more helpful can reduce honesty. Making them harmless can reduce helpfulness. Prioritize based on use case.
Advantages
- Makes explicit tensions that practitioners already experience
- Guides system design and objective function choices
- Helps communicate tradeoffs to stakeholders
- Explains why different applications need different priorities
Disadvantages
- Framed as trilemma but spectrum might be more accurate
- Definitions are subjective (what counts as harmful?)
- Some argue all three can be improved simultaneously with better techniques
- Does not account for other important properties (fairness, privacy)
// HHH Trilemma: Context-aware priority adjustment
class HHHAIAgent:
def __init__(self):
self.priorities = {
'helpful': 0.33,
'honest': 0.33,
'harmless': 0.33
}
def adjust_priorities(self, context):
if context['domain'] == 'medical':
// Medical: Honesty and harmlessness paramount
self.priorities = {
'helpful': 0.2,
'honest': 0.4,
'harmless': 0.4
}
elif context['domain'] == 'creative_writing':
// Creative: Helpfulness prioritized
self.priorities = {
'helpful': 0.5,
'honest': 0.25,
'harmless': 0.25
}
elif context['domain'] == 'education':
// Education: Balanced but honesty key
self.priorities = {
'helpful': 0.3,
'honest': 0.4,
'harmless': 0.3
}
def generate_response(self, prompt, context):
self.adjust_priorities(context)
// Apply constraints based on priorities
if self.priorities['harmless'] > 0.35:
prompt = self.add_safety_constraints(prompt)
if self.priorities['honest'] > 0.35:
response = self.model.generate_with_citations(prompt)
else:
response = self.model.generate(prompt)
if self.priorities['helpful'] > 0.4:
response = self.optimize_for_helpfulness(response)
return response