What is RAG?
Retrieval-Augmented Generation (RAG) combines two stages: retrieval (finding relevant documents from a knowledge base) and generation (generating answers based on found materials). The model doesn't rely on training memory but on provided, current data.
How does a RAG pipeline work?
1. User asks a question. 2. System searches for relevant document fragments in a vector database (embedding + similarity search). 3. Found fragments are added to the prompt as context. 4. Model generates a response citing sources.
RAG vs fine-tuning
Use RAG when data changes (knowledge base, documentation, regulations). Use fine-tuning when you want to change model behavior (response style, format, domain specialization). In enterprise practice, both approaches are usually combined.