RAG nodes

RAG = Retrieval-Augmented Generation. Instead of asking an LLM to answer from its general training, you first retrieve relevant passages from your own documents and then ground the model on those passages.

The four-node pipeline:

[Documents]
    │
    ▼
[Document Chunker] ──→ [Embeddings] ──→ [Vector Database (store)]
                                             │
                                             ▼                    ┌────────────┐
[User question] ──→ [Embeddings] ──→ [Vector Database (search)] ──→ [Reranker] ─→ [AI Agent]
                                                                    (optional)

Document Chunker

What: Splits a long document into smaller chunks suitable for embedding.

Inputs: Document text plus metadata.

Outputs: Array of chunks, each with text, metadata, and chunk index.

Configuration:

Field	Notes
Chunk size	Target characters per chunk (default 1000)
Overlap	Characters of context shared between adjacent chunks (default 200)
Strategy	`fixed-size`, `sentence`, `paragraph`, `semantic`
Metadata fields	Fields from the source document to carry into every chunk (e.g. `documentId`, `source`, `author`)

Example: Ingest a 50-page PDF.

[File Operations: read PDF]
   ↓
[PDF Operations: extract text]
   ↓
[Document Chunker: 1000-char chunks, 200 overlap]
   ↓
[Embeddings: turn each chunk into a vector]
   ↓
[Vector DB: upsert chunks]

Embeddings

What: Converts text into a vector embedding (a numeric array).

Providers: OpenAI (text-embedding-3-large, text-embedding-3-small), Anthropic, Google, Cohere, local (sentence-transformers via Ollama).

Inputs: A string or array of strings.

Outputs: A vector or array of vectors, each typically 1024–3072 dimensions.

Configuration:

Field	Notes
Provider	Embedding provider
Model	Specific embedding model
Credential	API key (or none for local models)

Tips:

Use the same embedding model for indexing and for querying. Mismatched models produce useless similarity scores.
Embedding cost is small per call but scales with corpus size. Cache aggressively.

Vector Database

What: Stores and queries embeddings for nearest-neighbour search.

Backends: Pinecone, Weaviate, Chroma, Milvus, pgvector (Postgres).

Operations:

Operation	What it does
`upsert`	Store / update vectors + metadata
`query`	Top-K nearest neighbours to a query vector
`delete`	Remove vectors by ID or filter
`count`	Count vectors matching a filter

Inputs (for query): A query vector (from an Embeddings node).

Outputs (for query): Array of { id, score, metadata, text } results.

Configuration:

Field	Notes
Backend	Vector DB to use
Credential	DB connection
Index / collection	Logical namespace
Top K	How many results to return (default 5)
Filter	Metadata filter applied before similarity (e.g. `documentType == "policy"`)

RAG Reranker

What: Takes the top-K results from the vector DB and re-orders them with a more expensive cross-encoder model that compares each result against the query directly. Improves precision of retrieval significantly.

Inputs: Query text plus list of candidate results.

Outputs: Re-ordered list, optionally truncated to top-N.

When to use: Vector search alone often retrieves "topically similar" passages that aren't actually answers. Reranking pushes the actual answers to the top.

When to skip: If precision is already good (small corpus, well-chunked), the reranker doubles latency and cost.

End-to-end RAG pattern

[Question trigger]
    ↓
[Embeddings: embed question]
    ↓
[Vector DB: query top-10 chunks]
    ↓
[RAG Reranker: keep top-3]
    ↓
[Data Transform: format chunks as context block]
    ↓
[AI Agent: answer with context + question]
    ↓
[AI Guardrails: PII filter, topic guard]
    ↓
[Response]

See Recipe 06: RAG document Q&A for the full walkthrough.

Tips & gotchas

Chunk size matters. Too small → fragmented context. Too large → diluted similarity scores. Start with 1000 chars + 200 overlap and tune.
Always store the source URL / page number as chunk metadata so you can cite back to it in the final answer.
Filter at query time when possible (e.g. tenant == "acme"), it's cheaper than re-ranking and prevents data leaks across tenants.
Re-embedding is expensive. Plan for it: when you change embedding models, you have to re-index your whole corpus.
Hybrid search (vector + keyword BM25) outperforms vector-only for many use cases. The Vector DB node supports hybrid on backends that allow it (Weaviate, pgvector with TS).

AI & LLM nodes, the AI Agent that consumes retrieved context
Storage & database, pgvector lives here
Recipe 06: RAG document Q&A

Found something out of date? This page lives in the Flero docs content set.

RAG nodes

Document Chunker

Embeddings

Vector Database

RAG Reranker

End-to-end RAG pattern

Tips & gotchas

Related