AutomationIntermediate

How to Add Error Handling and Retries to n8n AI Workflows

Protect AI workflows from rate limits and flaky responses with node retries, an error workflow, and alerts so failures never go silent.

8 minIntermediate

LLM APIs rate-limit, time out, and occasionally return junk. A workflow that ignores this looks fine in testing and quietly drops data in production. This guide adds three layers of protection: per-node retries, a global error workflow, and an alert so you always know when something failed.

What you need

An existing AI workflow that calls an LLM
A notification destination such as Slack, Telegram, or email
A few minutes to configure node and workflow settings

Step 1: Turn on retries for the LLM node

Open the OpenAI or chat model node, go to its Settings tab, and enable Retry On Fail. Set 3 retries with a wait of a few seconds between attempts. Most rate-limit errors clear on the second try, so this alone removes a large share of failures.

n8n - Node settings

Settings (OpenAI node)

Retry On Fail [x]

Max Tries 3

Wait Between Tries 5000 ms

Continue On Fail [ ]

Per-node retry settings under the Settings tab.

Continue On Fail hides errors

Continue On Fail lets the workflow proceed past a failed node, but it can mask real problems. Use it only when you handle the empty result downstream, not as a blanket fix.

Step 2: Build a dedicated error workflow

Create a new workflow whose first node is an Error Trigger. This workflow runs automatically whenever a linked workflow fails, and it receives details about which node broke and why.

Error workflow - Slack message

Workflow failed: {{ $json.workflow.name }}
Node: {{ $json.execution.lastNodeExecuted }}
Error: {{ $json.execution.error.message }}
Time: {{ $json.execution.startedAt }}

Step 3: Link the error workflow

Open your main AI workflow, go to its Settings, and set Error Workflow to the one you just built. Now any unhandled failure in the main workflow fires the error workflow and posts an alert.

Step 4: Validate model output before using it

Even a successful API call can return malformed content. Add an IF node after the LLM that checks the result is present and well-formed, and route bad output to a fallback or alert instead of passing garbage downstream.

n8n - execution log

Simulated rate limit on first attempt

OpenAI node: 429 rate_limit_exceeded

Retry 1/3 in 5s...

OpenAI node: success

Workflow finished without firing error trigger

Result

Transient errors now self-heal through retries, genuine failures trigger an instant alert with the exact node and message, and malformed model output is caught before it spreads. Your AI workflows fail loudly and recover quietly.