What is RAG?

Overview of the workings & uses of Retrieval-Augmented Generation

· AI,Overview,RAG

RAG stands for “Retrieval-Augmented Generation”.

Natural language processing can be simplified to a two-stage process:

  • Retrieval – A mechanism that searches a large database or corpus (e.g. documents, knowledge base) to find relevant information based on a query.
  • Generation – A generative model (usually an LLM) that uses the retrieved information to generate a coherent and relevant response.

In RAG, you direct the LLM to use a specific sub-set of information, rather than retrieving its own, internal information.

How RAG Works

The steps taken during resource-augmented generation:

  1. User Query: A question or prompt is submitted (e.g., "What is the capital of Canada?").
  2. Retriever: Instead of relying only on its internal training, the system searches the specified source(s) to find passages or documents relevant to the query.
  3. Reader/Generator: The retrieved documents are passed to a generative model, which synthesises an answer based on both the query and the retrieved content.
  4. Response: A final result is returned.

Why RAG Is Useful

  • Up-to-date Knowledge: Unlike static LLMs that are trained on fixed data, RAG can reference more current and/or dynamic sources.
  • Factual Accuracy: Reduces hallucinations by grounding generation in real documents & reducing the risk of using unrelated material.
  • Smaller Model Footprint: More economical, better performance.

Common Use Cases

A non-exhaustive list of scenarios where using a RAG may be more appropriate:

  • AI assistants with access to a knowledge base
  • Customer support bots referencing relevant documentation
  • Legal, medical, or financial Q&A systems using trusted data sources
A flow-chart picturing the numbered steps listed in the "How RAG Works" section of this post.