Recipe 06, RAG document Q&A

Goal: answer a user's question by retrieving relevant chunks from a corpus of PDFs and grounding an LLM on them.

Difficulty: Advanced · Time: ~60 minutes (one-time index) + ongoing queries · Connectors: OpenAI / Anthropic, a vector database (Pinecone / Weaviate / pgvector), File Operations

This recipe has two workflows:

Indexing workflow, runs once (or whenever your corpus changes). Ingests documents into the vector DB.
Query workflow, runs per question. Retrieves and answers.

Prerequisites

LLM provider credential (Anthropic or OpenAI).
Embedding provider credential (OpenAI text-embedding-3-small is a sensible default).
Vector database, Pinecone or pgvector is easiest. Create an index named flero-docs with the right dimension count (1536 for OpenAI small).
A folder of PDF documents accessible to Flero (S3, local disk, or uploaded via file ops).

Workflow 1, Indexing

┌──────────┐    ┌────────────┐    ┌───────────────┐    ┌────────────┐    ┌──────────────┐
│  Start   │───▶│ File Ops:  │───▶│   PDF Ops:    │───▶│  Document  │───▶│  Embeddings  │
│ (manual) │    │  list files│    │ extract text  │    │   Chunker  │    │              │
└──────────┘    └────────────┘    └───────────────┘    └────────────┘    └──────┬───────┘
                                                                                ▼
                                                                       ┌─────────────────┐
                                                                       │  Vector DB:     │
                                                                       │  upsert chunks  │
                                                                       └─────────────────┘

Step-by-step

Start, manual trigger. Run when the corpus changes.
File Operations: list, list every PDF in your source folder. Outputs an array.
Loop over the list.
Inside the loop: File Operations: read the PDF.
PDF Operations: extract text → outputs the text plus per-page metadata.
Document Chunker, chunk size 1 000 chars, overlap 200, strategy paragraph. Carry forward metadata fields: filename, pageNumber.
Embeddings, provider OpenAI, model text-embedding-3-small, input from the chunker. Outputs a vector per chunk.
Vector DB: upsert, index flero-docs, with the chunk text + metadata + the embedding.

Run once. Wait. Verify the vector DB shows N chunks.

Workflow 2, Query

┌─────────────┐    ┌──────────────┐    ┌───────────────┐    ┌───────────┐    ┌────────────┐    ┌──────────┐    ┌────────────┐
│   Webhook   │───▶│  Embeddings  │───▶│  Vector DB:   │───▶│   RAG     │───▶│  Transform │───▶│ AI Agent │───▶│ Guardrails │
│   trigger   │    │  (embed Q)   │    │  query top-10 │    │ Reranker  │    │  (format   │    │ (answer  │    │ (PII filter│
└─────────────┘    └──────────────┘    └───────────────┘    │  top-3    │    │  context)  │    │  with    │    │  + topic)  │
                                                            └───────────┘    └────────────┘    │  context)│    └─────┬──────┘
                                                                                               └──────────┘          │
                                                                                                                     ▼
                                                                                                              ┌────────────┐
                                                                                                              │ Terminal:  │
                                                                                                              │  return    │
                                                                                                              │  answer    │
                                                                                                              └────────────┘

Step-by-step

Webhook trigger, POST { "question": "..." }.
Embeddings, embed $trigger.body.question with the same model used for indexing.
Vector DB: query, index flero-docs, top K = 10, optional metadata filter (e.g. tenant == $trigger.body.tenant).
RAG Reranker, narrow to top 3.

Transform, build a context block:

map:
  context: |
    {{ $node["RAG Reranker"].json.results[0].text }}
    [Source: {{ $node["RAG Reranker"].json.results[0].metadata.filename }}, page {{ $node["RAG Reranker"].json.results[0].metadata.pageNumber }}]

    {{ $node["RAG Reranker"].json.results[1].text }}
    [Source: ...]

    {{ $node["RAG Reranker"].json.results[2].text }}
    [Source: ...]

AI Agent:
- System instructions: "Answer the user's question using only the provided context. If the answer isn't in the context, say 'I don't know'. Always cite the source (filename + page) for each claim."
- Prompt:
```
Context:
{{ $node["Transform"].json.context }}

Question: {{ $trigger.body.question }}
```
- Temperature: 0.2
- Max tokens: 700
AI Guardrails, PII filter, topic guard.
Terminal, return { answer: $node["Guardrails"].json.output, sources: <list of filenames cited> }.

Set workflow's response mode (on the Webhook Trigger) to On completion so the answer goes back to the caller synchronously.

Try it

curl -X POST <query-workflow webhook URL> \
  -H "Content-Type: application/json" \
  -d '{"question": "What is our refund policy for products bought during a sale?"}'

Response (if the corpus has the answer):

{
  "answer": "Products bought during a promotional sale carry a 14-day refund window... [Source: policies.pdf, page 7]",
  "sources": ["policies.pdf"]
}

If the corpus doesn't cover the question, the answer should be "I don't know", that's the test that the system isn't hallucinating.

Variations

Per-user / per-tenant isolation

Add a tenant field to chunk metadata at index time. Filter on it at query time so different customers' documents don't bleed into each others' answers.

Hybrid search

Some vector DBs (Weaviate, pgvector with TS) support BM25 + vector. Configure the Vector DB node to use hybrid mode for better recall.

Include images in the index, embed image captions, store image references in metadata. Allow the AI Agent to reference image filenames in its answers.

Streaming responses

For chat-style UI, set the AI Agent to stream: true and front this workflow with the Chat trigger instead of a webhook.

Re-index automatically

Hook the indexing workflow to a Schedule trigger (nightly) and add a "files changed since last index" filter.

Troubleshooting

Symptom	Likely cause	Fix
Vector search returns irrelevant chunks	Embedding model mismatch (used different models for index vs query)	Use the same embedding model in both workflows
Answers cite the wrong source	Metadata not propagating through chunking	In Document Chunker, explicitly list metadata fields to carry
Reranker barely helps	Top-K too low (only 10 input means little to rerank from)	Increase Vector DB top-K to 20–30
LLM hallucinates despite "use only context"	Context too large; model is using its own knowledge	Lower top-K, raise reranker top-N, sharper system prompt
Indexing takes forever	Big corpus, sequential loop	Use Distributed Execution node to parallelise indexing

Found something out of date? This page lives in the Flero docs content set.

Recipe 06, RAG document Q&A

Prerequisites

Workflow 1, Indexing

Step-by-step

Workflow 2, Query

Step-by-step

Try it

Variations

Per-user / per-tenant isolation

Hybrid search

Multi-modal

Streaming responses

Re-index automatically

Troubleshooting

Next