RAG nodes
RAG = Retrieval-Augmented Generation. Instead of asking an LLM to answer from its general training, you first retrieve relevant passages from your own documents and then ground the model on those passages.
The four-node pipeline:
[Documents]
│
▼
[Document Chunker] ──→ [Embeddings] ──→ [Vector Database (store)]
│
▼ ┌────────────┐
[User question] ──→ [Embeddings] ──→ [Vector Database (search)] ──→ [Reranker] ─→ [AI Agent]
(optional)Document Chunker
What: Splits a long document into smaller chunks suitable for embedding.
Inputs: Document text plus metadata.
Outputs: Array of chunks, each with text, metadata, and chunk index.
Configuration:
| Field | Notes |
|---|---|
| Chunk size | Target characters per chunk (default 1000) |
| Overlap | Characters of context shared between adjacent chunks (default 200) |
| Strategy | fixed-size, sentence, paragraph, semantic |
| Metadata fields | Fields from the source document to carry into every chunk (e.g. documentId, source, author) |
Example: Ingest a 50-page PDF.
[File Operations: read PDF]
↓
[PDF Operations: extract text]
↓
[Document Chunker: 1000-char chunks, 200 overlap]
↓
[Embeddings: turn each chunk into a vector]
↓
[Vector DB: upsert chunks]Embeddings
What: Converts text into a vector embedding (a numeric array).
Providers: OpenAI (text-embedding-3-large, text-embedding-3-small), Anthropic, Google, Cohere, local (sentence-transformers via Ollama).
Inputs: A string or array of strings.
Outputs: A vector or array of vectors, each typically 1024–3072 dimensions.
Configuration:
| Field | Notes |
|---|---|
| Provider | Embedding provider |
| Model | Specific embedding model |
| Credential | API key (or none for local models) |
Tips:
- Use the same embedding model for indexing and for querying. Mismatched models produce useless similarity scores.
- Embedding cost is small per call but scales with corpus size. Cache aggressively.
Vector Database
What: Stores and queries embeddings for nearest-neighbour search.
Backends: Pinecone, Weaviate, Chroma, Milvus, pgvector (Postgres).
Operations:
| Operation | What it does |
|---|---|
upsert |
Store / update vectors + metadata |
query |
Top-K nearest neighbours to a query vector |
delete |
Remove vectors by ID or filter |
count |
Count vectors matching a filter |
Inputs (for query): A query vector (from an Embeddings node).
Outputs (for query): Array of { id, score, metadata, text } results.
Configuration:
| Field | Notes |
|---|---|
| Backend | Vector DB to use |
| Credential | DB connection |
| Index / collection | Logical namespace |
| Top K | How many results to return (default 5) |
| Filter | Metadata filter applied before similarity (e.g. documentType == "policy") |
RAG Reranker
What: Takes the top-K results from the vector DB and re-orders them with a more expensive cross-encoder model that compares each result against the query directly. Improves precision of retrieval significantly.
Inputs: Query text plus list of candidate results.
Outputs: Re-ordered list, optionally truncated to top-N.
When to use: Vector search alone often retrieves "topically similar" passages that aren't actually answers. Reranking pushes the actual answers to the top.
When to skip: If precision is already good (small corpus, well-chunked), the reranker doubles latency and cost.
End-to-end RAG pattern
[Question trigger]
↓
[Embeddings: embed question]
↓
[Vector DB: query top-10 chunks]
↓
[RAG Reranker: keep top-3]
↓
[Data Transform: format chunks as context block]
↓
[AI Agent: answer with context + question]
↓
[AI Guardrails: PII filter, topic guard]
↓
[Response]See Recipe 06: RAG document Q&A for the full walkthrough.
Tips & gotchas
- Chunk size matters. Too small → fragmented context. Too large → diluted similarity scores. Start with 1000 chars + 200 overlap and tune.
- Always store the source URL / page number as chunk metadata so you can cite back to it in the final answer.
- Filter at query time when possible (e.g.
tenant == "acme"), it's cheaper than re-ranking and prevents data leaks across tenants. - Re-embedding is expensive. Plan for it: when you change embedding models, you have to re-index your whole corpus.
- Hybrid search (vector + keyword BM25) outperforms vector-only for many use cases. The Vector DB node supports hybrid on backends that allow it (Weaviate, pgvector with TS).
Related
- AI & LLM nodes, the AI Agent that consumes retrieved context
- Storage & database, pgvector lives here
- Recipe 06: RAG document Q&A
Found something out of date? This page lives in the Flero docs content set.