·
RAG stands for “Retrieval-Augmented Generation”.
Natural language processing can be simplified to a two-stage process:
- Retrieval – A mechanism that searches a large database or corpus (e.g. documents, knowledge base) to find relevant information based on a query.
- Generation – A generative model (usually an LLM) that uses the retrieved information to generate a coherent and relevant response.
In RAG, you direct the LLM to use a specific sub-set of information, rather than retrieving its own, internal information.
How RAG Works
The steps taken during resource-augmented generation:
- User Query: A question or prompt is submitted (e.g., "What is the capital of Canada?").
- Retriever: Instead of relying only on its internal training, the system searches the specified source(s) to find passages or documents relevant to the query.
- Reader/Generator: The retrieved documents are passed to a generative model, which synthesises an answer based on both the query and the retrieved content.
- Response: A final result is returned.
Why RAG Is Useful
- Up-to-date Knowledge: Unlike static LLMs that are trained on fixed data, RAG can reference more current and/or dynamic sources.
- Factual Accuracy: Reduces hallucinations by grounding generation in real documents & reducing the risk of using unrelated material.
- Smaller Model Footprint: More economical, better performance.
Common Use Cases
A non-exhaustive list of scenarios where using a RAG may be more appropriate:
- AI assistants with access to a knowledge base
- Customer support bots referencing relevant documentation
- Legal, medical, or financial Q&A systems using trusted data sources
