RAG (Retrieval-Augmented Generation)

What is RAG?

Retrieval-Augmented Generation (RAG) combines two stages: retrieval (finding relevant documents from a knowledge base) and generation (generating answers based on found materials). The model doesn't rely on training memory but on provided, current data.

How does a RAG pipeline work?

1. User asks a question. 2. System searches for relevant document fragments in a vector database (embedding + similarity search). 3. Found fragments are added to the prompt as context. 4. Model generates a response citing sources.

RAG vs fine-tuning

Use RAG when data changes (knowledge base, documentation, regulations). Use fine-tuning when you want to change model behavior (response style, format, domain specialization). In enterprise practice, both approaches are usually combined.

What is RAG?

How does a RAG pipeline work?

RAG vs fine-tuning

Related terms

Related services and products