Recipe 06, RAG document Q&A
Goal: answer a user's question by retrieving relevant chunks from a corpus of PDFs and grounding an LLM on them.
Difficulty: Advanced · Time: ~60 minutes (one-time index) + ongoing queries · Connectors: OpenAI / Anthropic, a vector database (Pinecone / Weaviate / pgvector), File Operations
This recipe has two workflows:
- Indexing workflow, runs once (or whenever your corpus changes). Ingests documents into the vector DB.
- Query workflow, runs per question. Retrieves and answers.
Prerequisites
- LLM provider credential (Anthropic or OpenAI).
- Embedding provider credential (OpenAI
text-embedding-3-smallis a sensible default). - Vector database, Pinecone or pgvector is easiest. Create an index named
flero-docswith the right dimension count (1536 for OpenAI small). - A folder of PDF documents accessible to Flero (S3, local disk, or uploaded via file ops).
Workflow 1, Indexing
┌──────────┐ ┌────────────┐ ┌───────────────┐ ┌────────────┐ ┌──────────────┐
│ Start │───▶│ File Ops: │───▶│ PDF Ops: │───▶│ Document │───▶│ Embeddings │
│ (manual) │ │ list files│ │ extract text │ │ Chunker │ │ │
└──────────┘ └────────────┘ └───────────────┘ └────────────┘ └──────┬───────┘
▼
┌─────────────────┐
│ Vector DB: │
│ upsert chunks │
└─────────────────┘Step-by-step
- Start, manual trigger. Run when the corpus changes.
- File Operations: list, list every PDF in your source folder. Outputs an array.
- Loop over the list.
- Inside the loop: File Operations: read the PDF.
- PDF Operations: extract text → outputs the text plus per-page metadata.
- Document Chunker, chunk size 1 000 chars, overlap 200, strategy
paragraph. Carry forward metadata fields:filename,pageNumber. - Embeddings, provider OpenAI, model
text-embedding-3-small, input from the chunker. Outputs a vector per chunk. - Vector DB: upsert, index
flero-docs, with the chunk text + metadata + the embedding.
Run once. Wait. Verify the vector DB shows N chunks.
Workflow 2, Query
┌─────────────┐ ┌──────────────┐ ┌───────────────┐ ┌───────────┐ ┌────────────┐ ┌──────────┐ ┌────────────┐
│ Webhook │───▶│ Embeddings │───▶│ Vector DB: │───▶│ RAG │───▶│ Transform │───▶│ AI Agent │───▶│ Guardrails │
│ trigger │ │ (embed Q) │ │ query top-10 │ │ Reranker │ │ (format │ │ (answer │ │ (PII filter│
└─────────────┘ └──────────────┘ └───────────────┘ │ top-3 │ │ context) │ │ with │ │ + topic) │
└───────────┘ └────────────┘ │ context)│ └─────┬──────┘
└──────────┘ │
▼
┌────────────┐
│ Terminal: │
│ return │
│ answer │
└────────────┘Step-by-step
- Webhook trigger, POST
{ "question": "..." }. - Embeddings, embed
$trigger.body.questionwith the same model used for indexing. - Vector DB: query, index
flero-docs, top K = 10, optional metadata filter (e.g.tenant == $trigger.body.tenant). - RAG Reranker, narrow to top 3.
- Transform, build a context block:
map: context: | {{ $node["RAG Reranker"].json.results[0].text }} [Source: {{ $node["RAG Reranker"].json.results[0].metadata.filename }}, page {{ $node["RAG Reranker"].json.results[0].metadata.pageNumber }}] {{ $node["RAG Reranker"].json.results[1].text }} [Source: ...] {{ $node["RAG Reranker"].json.results[2].text }} [Source: ...] - AI Agent:
- System instructions: "Answer the user's question using only the provided context. If the answer isn't in the context, say 'I don't know'. Always cite the source (filename + page) for each claim."
- Prompt:
Context: {{ $node["Transform"].json.context }} Question: {{ $trigger.body.question }} - Temperature: 0.2
- Max tokens: 700
- AI Guardrails, PII filter, topic guard.
- Terminal, return
{ answer: $node["Guardrails"].json.output, sources: <list of filenames cited> }.
Set workflow's response mode (on the Webhook Trigger) to On completion so the answer goes back to the caller synchronously.
Try it
curl -X POST <query-workflow webhook URL> \
-H "Content-Type: application/json" \
-d '{"question": "What is our refund policy for products bought during a sale?"}'Response (if the corpus has the answer):
{
"answer": "Products bought during a promotional sale carry a 14-day refund window... [Source: policies.pdf, page 7]",
"sources": ["policies.pdf"]
}If the corpus doesn't cover the question, the answer should be "I don't know", that's the test that the system isn't hallucinating.
Variations
Per-user / per-tenant isolation
Add a tenant field to chunk metadata at index time. Filter on it at query time so different customers' documents don't bleed into each others' answers.
Hybrid search
Some vector DBs (Weaviate, pgvector with TS) support BM25 + vector. Configure the Vector DB node to use hybrid mode for better recall.
Multi-modal
Include images in the index, embed image captions, store image references in metadata. Allow the AI Agent to reference image filenames in its answers.
Streaming responses
For chat-style UI, set the AI Agent to stream: true and front this workflow with the Chat trigger instead of a webhook.
Re-index automatically
Hook the indexing workflow to a Schedule trigger (nightly) and add a "files changed since last index" filter.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| Vector search returns irrelevant chunks | Embedding model mismatch (used different models for index vs query) | Use the same embedding model in both workflows |
| Answers cite the wrong source | Metadata not propagating through chunking | In Document Chunker, explicitly list metadata fields to carry |
| Reranker barely helps | Top-K too low (only 10 input means little to rerank from) | Increase Vector DB top-K to 20–30 |
| LLM hallucinates despite "use only context" | Context too large; model is using its own knowledge | Lower top-K, raise reranker top-N, sharper system prompt |
| Indexing takes forever | Big corpus, sequential loop | Use Distributed Execution node to parallelise indexing |
Next
Found something out of date? This page lives in the Flero docs content set.