What is RAG?

What is RAG?

Overview of the workings & uses of Retrieval-Augmented Generation

· AI,Overview,RAG

RAG stands for “Retrieval-Augmented Generation”. 

Natural language processing can be simplified to a two-stage process:

Retrieval – A mechanism that searches a large database or corpus (e.g. documents, knowledge base) to find relevant information based on a query.
Generation – A generative model (usually an LLM) that uses the retrieved information to generate a coherent and relevant response.

In RAG, you direct the LLM to use a specific sub-set of information, rather than retrieving its own, internal information.

How RAG Works

The steps taken during resource-augmented generation:

User Query: A question or prompt is submitted (e.g., "What is the capital of Canada?").
Retriever: Instead of relying only on its internal training, the system searches the specified source(s) to find passages or documents relevant to the query.
Reader/Generator: The retrieved documents are passed to a generative model, which synthesises an answer based on both the query and the retrieved content.
Response: A final result is returned.

Why RAG Is Useful

Up-to-date Knowledge: Unlike static LLMs that are trained on fixed data, RAG can reference more current and/or dynamic sources.
Factual Accuracy: Reduces hallucinations by grounding generation in real documents & reducing the risk of using unrelated material.
Smaller Model Footprint: More economical, better performance.

Common Use Cases

A non-exhaustive list of scenarios where using a RAG may be more appropriate:

AI assistants with access to a knowledge base
Customer support bots referencing relevant documentation
Legal, medical, or financial Q&A systems using trusted data sources