AI & LLM
Nodes for working with large language models, running prompts, evaluating outputs, guarding against unsafe content, and routing semantically.
AI Agent
What: Sends prompts to an LLM provider and returns the response.
Providers: OpenAI (GPT-4 / 3.5), Anthropic (Claude Opus / Sonnet / Haiku), Google (Gemini Pro), Ollama (local).
Inputs: A prompt string, or a structured messages array for multi-turn.
Outputs:
response, the LLM's text output.usage, token counts (input, output, total).cost, estimated cost based on the provider's per-token pricing.finishReason,stop,length,tool_calls, etc.
Configuration:
| Field | Notes |
|---|---|
| Provider | One of OpenAI / Anthropic / Google / Ollama |
| Credential | API key for the provider |
| Model | Specific model name (e.g. claude-opus-4-7) |
| System instructions | Role definition that prepends the conversation |
| Prompt / Messages | Either a single prompt or an array of {role, content} objects |
| Max tokens | Output cap |
| Temperature | Sampling randomness, 0–1 |
| Top-p, Top-k, Presence/Frequency penalty | Other sampling controls |
| Stop sequences | Strings that, if generated, halt generation |
| Stream | If true, returns chunks incrementally (useful for long outputs) |
| Enable memory | Persist conversation state across runs |
| Memory key | Conversation identifier |
| Max memory messages | Context window cap |
| Tools | Function definitions the LLM can call |
| Tool choice | auto, none, or force a specific tool |
| Response format | text (default) or json_object with optional JSON Schema |
Example (simple summarisation):
Provider: anthropic
Model: claude-opus-4-7
System: You are a concise news summarizer.
Prompt: Summarise this article in 2 sentences: {{ $node["Fetch article"].json.body.content }}
Temperature: 0.3
Max tokens: 300Example (multi-turn with memory):
Provider: openai
Model: gpt-4
System: You are a helpful CRM assistant.
Messages: [
{ role: "user", content: {{ $trigger.body.message }} }
]
Enable memory: true
Memory key: {{ $trigger.body.conversationId }}Vision input (images):
Prompt: Describe what's in this product photo.
Images: [ {{ $node["Download image"].json.base64 }} ]Prompt Template
What: Builds a prompt from a reusable template with placeholders. Use this when the same prompt structure repeats across nodes or workflows.
Configuration: A template string with {{ variable }} placeholders and a variables object that fills them.
Built-in templates: summarize, code_review, translate, extract_json, qa, creative_writing, data_analysis, email_draft.
AI Eval
What: Uses an LLM to score another LLM's output against criteria. Output is a structured score / verdict.
Use case: Quality checks (was the AI's answer correct?), compliance evaluation (did the response stay on-topic?), A/B testing prompts.
Outputs: score (number), passed (bool), rationale (text).
AI Guardrails
What: Filters AI outputs against safety / compliance rules before they leave the workflow.
Checks:
- PII redaction (emails, phone numbers, credit cards, SSNs).
- Toxicity / safety classifier.
- Topic guard (must / must not mention specific topics).
- Length and format constraints.
Outputs: output (the filtered text) and blocked (true if a guard tripped, with details on which rule).
Semantic Router
What: Routes data by semantic similarity. Compute an embedding of the input, compare to known intents, route to the closest match's output port.
Use case: Intent classification ("is the user asking for help, sales info, or refund?"), smart fan-out in support chatbots.
Learning
What: Persists workflow signals to a feedback store that can later fine-tune prompts or routing decisions. Advanced.
Workflow Assistant
What: Powers the in-editor AI builder. Available as a node for advanced cases (e.g. building a chatbot that itself can build workflows). Most users interact with this via the AI builder, not directly.
Token Cost Guard
What: A hard ceiling on LLM token spend per run. Tracks usage across AI Agent nodes and aborts the workflow if the budget is exceeded.
Configuration:
| Field | Notes |
|---|---|
| Budget (tokens) | Max input + output tokens for this run |
| Budget (USD) | Alternative: ceiling in cost |
| Action | abort (fail), route to error port, warn-only |
Use case: Production workflows where a runaway LLM could rack up significant cost.
Tips & gotchas
- Model names change frequently. Use the provider's docs for the current model ID. Flero updates its picklist regularly but isn't authoritative.
- JSON output mode is not guaranteed parseable even when the model says "json_object", wrap with a schema in
responseFormatand follow up with aData Parserif you need a hard guarantee. - Temperature 0 is not deterministic across providers, there's still some variability. Use seeded sampling where the provider supports it (some OpenAI models).
- Memory has a cost. Every prior message is re-sent on every turn. Cap
maxMemoryMessagesto a reasonable number (5–10 for support bots). - Vision input is large. Images count for many tokens; budget accordingly. Resize before embedding when possible.
- Tool calling is provider-specific. Tool definitions written for OpenAI may need tweaking for Anthropic or Google. Test before going live.
- Cost guard runs after each AI node, so a single oversized prompt can still exceed the budget by one node's worth. Set the guard slightly under your real cap.
Related
- RAG nodes, pair with AI Agent for retrieval-grounded answers
- AI builder
- Settings → Billing & usage, track AI spend
Found something out of date? This page lives in the Flero docs content set.