Debugging failures
A systematic approach to "my workflow failed; now what?"
๐ธ Screenshot needed:
execution__failed-node-error.png, Execution detail with a red node selected and the error panel showing the error type, message, stack trace summary, and "Open node in editor" button.
The five-minute investigation
When you see a failed run:
- Open the execution detail. From
/logs, click the failed row. - Find the red node. The canvas snapshot will highlight it; the timeline table will rank it near the bottom of the success column.
- Read the error. Click the red node โ the inspector on the right shows the error type and message. Common patterns below.
- Inspect the input. Expand the input column for that node. Is it what you expected? If not, the problem is upstream, look at the previous node's output.
- Reproduce locally. Open the workflow in the editor โ click Test on the failed node with the same input. Iterate.
Common errors and what they mean
TimeoutError
The node took longer than its timeout to complete. Usually an HTTP node waiting on a slow upstream.
Fix:
- Increase the node's
timeoutfield. - Add retries with backoff.
- Investigate why the upstream is slow.
NetworkError / ConnectionRefused
DNS failure, transient network blip, or the upstream service is genuinely down.
Fix:
- Add retries (1โ3 attempts).
- For mission-critical services, gate with an Error Handling node and an alternate path.
AuthenticationError / 401 Unauthorized / 403 Forbidden
Credential is expired, revoked, missing required scopes, or the token is for the wrong account.
Fix:
- Test the credential under Settings โ Credentials.
- Re-authenticate if expired.
- Check the connector's catalog page for the scopes it needs.
RateLimitExceeded / 429 Too Many Requests
You're calling the service faster than your quota allows.
Fix:
- Add a Delay node or use the API Pagination & Rate Limit wrapper.
- Cache responses you call repeatedly.
- Reduce workflow concurrency.
ValidationError
Some field didn't match what the node expected. Could be your config (a misshapen request body) or the upstream data (a field came back as null when you expected a string).
Fix:
- Read the message, it almost always names the offending field.
- Add a Schema Validator Guard earlier in the chain to catch bad data before it reaches the failing node.
[EXPR-ERROR: ...]
An expression failed to resolve. Specific guidance: Debugging expressions.
4xx from the external service
A genuine business error (item not found, invalid state transition). Read the response body in the node's output, most APIs return useful error JSON.
5xx from the external service
The external service had a problem on its end. Retry; if persistent, raise a ticket with the service.
When the error is in your data, not your workflow
Sometimes the workflow is correct and the data is malformed (missing field, wrong type). Distinguish:
- Stable, known shape, guard with a Schema Validator early.
- Sometimes-missing optional fields, use the default pattern (Data Transform
map, or a Code node). - Occasionally-wrong type, convert defensively with Data Transform's
convertop.
Replay and iterate
After fixing, Replay the failed run from the execution detail. The fix runs against the original input; you can see if you got it right without waiting for a new trigger.
For more invasive changes:
- Note the input that caused the failure (Copy from the execution detail).
- Open the workflow in the editor.
- Pin that input on the trigger node (Pinning data).
- Test the failing node repeatedly while you iterate.
When you can't reproduce
Sometimes a run fails once and never again, transient network, race condition, momentary auth blip. To rule out a real issue:
- Check the monitoring dashboard, is the failure rate spiking?
- Search
/logsfor similar errors across all workflows, is it a workspace-wide problem? - Replay a few times, does it reliably reproduce, or is it actually transient?
If it's a one-off, log the execution ID somewhere for the record and move on. If it recurs, dig deeper.
When the workflow is stuck (not failed)
pending for a long time โ workers are busy or down. Check Settings โ Workspace โ Workers (Admin only). Also see the monitoring dashboard for queue depth.
running for longer than expected โ the current node is hanging. Click into the execution detail; the timeline will show which node is still "running". Cancel via the Cancel action.
waitingForInput โ an approval is pending. Resolve it from /approvals.
Logging strategically
Before things go wrong, scatter Log nodes at meaningful checkpoints with info level. When something does go wrong, those logs are the first place you'll look.
Message: After fetch โ got {{ $node["Fetch users"].json.length }} users for tenant {{ $trigger.body.tenantId }}
Level: infoTips & gotchas
- Read the full error message, not just the type. The stack trace often points exactly at the field.
- One failure can mask another. When a node fails, downstream nodes don't run, so you might fix the first node and discover a second failure on replay. Iterate.
- Sandbox mode is your friend for reproducing failures involving external services without retriggering side effects.
- Don't add retries to non-idempotent operations (POST a charge to Stripe, send an email). Two retries = two charges. Use idempotency keys when retries are needed.
Related
Found something out of date? This page lives in the Flero docs content set.