Back to Catalog
Language Models
general
Retrieval-Augmented Generation (RAG)
Enhances LLM responses by retrieving relevant information from external databases before generating text.
Intent & Description
RAG is a pattern where the user query is used to search a vector database or search engine. The retrieved search results are injected into the prompt context, allowing the LLM to generate factual, up-to-date, and context-specific answers.
Real-world Use Case
Question-answering systems over internal company wikis, technical manuals, or PDF reports.
Advantages
- Dramatically reduces LLM hallucinations.
- Enables real-time data updates without retraining or fine-tuning.
- Provides source attribution/citations for generated answers.
Disadvantages
- Response latency increases due to the retrieval step.
- Retrieval accuracy directly bounds the generation quality.
Implementation Example
# A simple Python RAG pipeline
class VectorDatabase:
def similarity_search(self, query, k=1):
# Simulated vector search
return ["Design patterns are reusable solutions to common software problems."]
class LLMClient:
def generate(self, prompt):
print(f"Sending prompt to LLM:\n---\n{prompt}\n---")
return "Based on the context, design patterns are reusable solutions..."
class RAGSystem:
def __init__(self, db, llm):
self.db = db
self.llm = llm
def query(self, user_prompt):
# 1. Retrieve context
context_docs = self.db.similarity_search(user_prompt, k=1)
context = "\n".join(context_docs)
# 2. Augment prompt
augmented_prompt = (
f"Use the following context to answer the question.\n"
f"Context: {context}\n"
f"Question: {user_prompt}"
)
# 3. Generate response
return self.llm.generate(augmented_prompt)
# Usage
rag = RAGSystem(VectorDatabase(), LLMClient())
print(rag.query("What are design patterns?"))