How to Fix OpenAI 429 Rate Limit Errors With Backoff
Stop your app from crashing on 429s by adding exponential backoff and reading the retry headers.
A 429 means you sent requests faster than your tier allows, or you ran out of tokens-per-minute headroom. The wrong reaction is to retry instantly in a tight loop, which makes it worse. The right fix is to back off, respect the retry hint the API gives you, and spread requests over time.
- A working OpenAI API key
- Node 18+ or Python 3.9+
- The ability to read response headers from your HTTP client
Step 1: Tell apart the two kinds of 429
There are two distinct causes. A rate limit means too many requests or tokens in a short window, and it clears on its own. An insufficient_quota error also returns 429 but means your account is out of credits, and no amount of retrying helps. Read the code field to know which one you have.
Step 2: Read the retry-after and ratelimit headers
The API ships headers that tell you exactly how long to wait and how much budget remains. Honor retry-after when present instead of guessing a delay.
Step 3: Add exponential backoff with jitter
Wrap your call in a retry loop that doubles the delay each attempt and adds a small random jitter so many clients do not retry in lockstep. Cap the number of attempts so a real outage does not hang forever.
async function withBackoff(fn, max = 5) {
for (let attempt = 0; attempt < max; attempt++) {
try {
return await fn();
} catch (err) {
if (err.status !== 429 || attempt === max - 1) throw err;
const retryAfter = Number(err.headers?.["retry-after"]) || 0;
const wait = retryAfter * 1000 || (2 ** attempt) * 500 + Math.random() * 250;
await new Promise((r) => setTimeout(r, wait));
}
}
}Step 4: Lower your request pressure
Backoff treats the symptom. To stop hitting the limit at all, batch where you can, use a smaller model for cheap calls, and add a concurrency limit so you never have more than a handful of in-flight requests at once.
Result
With backoff plus a concurrency cap of 5, a batch job that previously failed on roughly one in ten calls now finishes cleanly. The job is a little slower during peak windows but never drops a request.
Watch related tutorials
12:38
14:09
17:53
15:00
12:00
1:42:18