Retrieval-Augmented Generation (RAG) is the AI pattern of pulling relevant context from a knowledge base and including it in the LLM prompt at inference time. Retrieval typically uses vector search. RAG allows the model to answer questions or perform tasks using information it wasn't trained on, or that may be more recent than its training data. RAG has been the dominant pattern for knowledge-grounded AI applications since 2023. It's the bridge between general-purpose LLMs and specific organizational knowledge.
How RAG works (the pipeline):