TroubleshootingIntermediate

How to Fix OpenAI 429 Rate Limit Errors With Backoff

Stop your app from crashing on 429s by adding exponential backoff and reading the retry headers.

9 minIntermediate

A 429 means you sent requests faster than your tier allows, or you ran out of tokens-per-minute headroom. The wrong reaction is to retry instantly in a tight loop, which makes it worse. The right fix is to back off, respect the retry hint the API gives you, and spread requests over time.

A working OpenAI API key
Node 18+ or Python 3.9+
The ability to read response headers from your HTTP client

Step 1: Tell apart the two kinds of 429

There are two distinct causes. A rate limit means too many requests or tokens in a short window, and it clears on its own. An insufficient_quota error also returns 429 but means your account is out of credits, and no amount of retrying helps. Read the code field to know which one you have.

zsh - api

$curl -i https://api.openai.com/v1/chat/completions ...

HTTP/2 429

retry-after: 2

x-ratelimit-remaining-requests: 0

{ "error": { "code": "rate_limit_exceeded" } }

Quota is not a rate limit

If the code is insufficient_quota, add credits or a payment method. Backoff will never clear it.

Step 2: Read the retry-after and ratelimit headers

The API ships headers that tell you exactly how long to wait and how much budget remains. Honor retry-after when present instead of guessing a delay.

Response headers

x-ratelimit-limit-requests: 500

x-ratelimit-remaining-requests: 0

x-ratelimit-reset-requests: 1.2s

x-ratelimit-remaining-tokens: 9540

retry-after: 2

When remaining drops to 0, wait for the reset window before sending again.

Step 3: Add exponential backoff with jitter

Wrap your call in a retry loop that doubles the delay each attempt and adds a small random jitter so many clients do not retry in lockstep. Cap the number of attempts so a real outage does not hang forever.

retry.js

async function withBackoff(fn, max = 5) {
  for (let attempt = 0; attempt < max; attempt++) {
    try {
      return await fn();
    } catch (err) {
      if (err.status !== 429 || attempt === max - 1) throw err;
      const retryAfter = Number(err.headers?.["retry-after"]) || 0;
      const wait = retryAfter * 1000 || (2 ** attempt) * 500 + Math.random() * 250;
      await new Promise((r) => setTimeout(r, wait));
    }
  }
}

Step 4: Lower your request pressure

Backoff treats the symptom. To stop hitting the limit at all, batch where you can, use a smaller model for cheap calls, and add a concurrency limit so you never have more than a handful of in-flight requests at once.

Watch tokens, not just requests

Most surprise 429s come from the tokens-per-minute limit, not requests-per-minute. Long prompts burn the token budget fast even at a low request count.

Result

With backoff plus a concurrency cap of 5, a batch job that previously failed on roughly one in ten calls now finishes cleanly. The job is a little slower during peak windows but never drops a request.