Skip to content
Docs
flero.ai

RAG nodes

RAG = Retrieval-Augmented Generation. Instead of asking an LLM to answer from its general training, you first retrieve relevant passages from your own documents and then ground the model on those passages.

The four-node pipeline:

[Documents]


[Document Chunker] ──→ [Embeddings] ──→ [Vector Database (store)]

                                             ▼                    ┌────────────┐
[User question] ──→ [Embeddings] ──→ [Vector Database (search)] ──→ [Reranker] ─→ [AI Agent]
                                                                    (optional)

Document Chunker

What: Splits a long document into smaller chunks suitable for embedding.

Inputs: Document text plus metadata.

Outputs: Array of chunks, each with text, metadata, and chunk index.

Configuration:

Field Notes
Chunk size Target characters per chunk (default 1000)
Overlap Characters of context shared between adjacent chunks (default 200)
Strategy fixed-size, sentence, paragraph, semantic
Metadata fields Fields from the source document to carry into every chunk (e.g. documentId, source, author)

Example: Ingest a 50-page PDF.

[File Operations: read PDF]

[PDF Operations: extract text]

[Document Chunker: 1000-char chunks, 200 overlap]

[Embeddings: turn each chunk into a vector]

[Vector DB: upsert chunks]

Embeddings

What: Converts text into a vector embedding (a numeric array).

Providers: OpenAI (text-embedding-3-large, text-embedding-3-small), Anthropic, Google, Cohere, local (sentence-transformers via Ollama).

Inputs: A string or array of strings.

Outputs: A vector or array of vectors, each typically 1024–3072 dimensions.

Configuration:

Field Notes
Provider Embedding provider
Model Specific embedding model
Credential API key (or none for local models)

Tips:

  • Use the same embedding model for indexing and for querying. Mismatched models produce useless similarity scores.
  • Embedding cost is small per call but scales with corpus size. Cache aggressively.

Vector Database

What: Stores and queries embeddings for nearest-neighbour search.

Backends: Pinecone, Weaviate, Chroma, Milvus, pgvector (Postgres).

Operations:

Operation What it does
upsert Store / update vectors + metadata
query Top-K nearest neighbours to a query vector
delete Remove vectors by ID or filter
count Count vectors matching a filter

Inputs (for query): A query vector (from an Embeddings node).

Outputs (for query): Array of { id, score, metadata, text } results.

Configuration:

Field Notes
Backend Vector DB to use
Credential DB connection
Index / collection Logical namespace
Top K How many results to return (default 5)
Filter Metadata filter applied before similarity (e.g. documentType == "policy")

RAG Reranker

What: Takes the top-K results from the vector DB and re-orders them with a more expensive cross-encoder model that compares each result against the query directly. Improves precision of retrieval significantly.

Inputs: Query text plus list of candidate results.

Outputs: Re-ordered list, optionally truncated to top-N.

When to use: Vector search alone often retrieves "topically similar" passages that aren't actually answers. Reranking pushes the actual answers to the top.

When to skip: If precision is already good (small corpus, well-chunked), the reranker doubles latency and cost.


End-to-end RAG pattern

[Question trigger]

[Embeddings: embed question]

[Vector DB: query top-10 chunks]

[RAG Reranker: keep top-3]

[Data Transform: format chunks as context block]

[AI Agent: answer with context + question]

[AI Guardrails: PII filter, topic guard]

[Response]

See Recipe 06: RAG document Q&A for the full walkthrough.


Tips & gotchas

  • Chunk size matters. Too small → fragmented context. Too large → diluted similarity scores. Start with 1000 chars + 200 overlap and tune.
  • Always store the source URL / page number as chunk metadata so you can cite back to it in the final answer.
  • Filter at query time when possible (e.g. tenant == "acme"), it's cheaper than re-ranking and prevents data leaks across tenants.
  • Re-embedding is expensive. Plan for it: when you change embedding models, you have to re-index your whole corpus.
  • Hybrid search (vector + keyword BM25) outperforms vector-only for many use cases. The Vector DB node supports hybrid on backends that allow it (Weaviate, pgvector with TS).


Found something out of date? This page lives in the Flero docs content set.