Pro11 min

Retrieval and Grounded Chatbots (RAG)

A bot that answers from a model's general knowledge will confidently invent your refund policy. Retrieval-augmented generation fixes that: before answering, you fetch the most relevant chunks of your own documents and put them in the prompt, so the model answers from your facts, not its imagination.

How RAG works

Split your docs into chunks and create an embedding (a vector) for each.
Store the vectors in a vector database (Pinecone, Supabase pgvector, Qdrant).
At query time, embed the user question and find the closest chunks.
Put those chunks in the prompt and ask the model to answer only from them.

n8n - RAG pipeline

[ Question ] --> [ Embed ] --> [ Vector search (top 5) ]

[ Build prompt with chunks ]

[ LLM: answer from context only ]

Retrieve first, then generate from what you found.

grounded answer prompt

Answer the question using ONLY the context below.
If the context does not contain the answer, say
"I do not have that information" and offer to connect a human.
Do not use outside knowledge.

Context:
{{ retrievedChunks }}

Question: {{ userQuestion }}

Chunking is half the battle

Too-large chunks bury the answer in noise; too-small chunks lose context. Aim for coherent passages of a few hundred tokens with a little overlap, and keep a source reference on each chunk so the bot can cite where the answer came from.

Tell it to admit ignorance

The single most important instruction in a grounded bot is permission to say I do not know. Without it, the model fills gaps with plausible fiction, which is exactly the failure RAG was supposed to prevent.

Result: a support bot that answers from your actual help center, cites the source, and declines gracefully when the answer is not in the docs.

How RAG works

Hands-on tasks