How to Fix Rate Limit Errors from an AI API
Diagnose a 429 rate-limit error and add backoff, batching, and concurrency control so your calls stop getting throttled.
A 429 Too Many Requests error means you hit the provider's rate limit. It is rarely a sign of a bug; it means your code sends requests faster than your tier allows. This guide reads the error correctly and fixes it with backoff and pacing.
What you need
- Code that calls an AI API and sometimes fails with 429
- Access to the provider's rate-limit docs for your tier
- About 10 minutes
Step 1: Read the error and headers
The response usually tells you how long to wait. Log the status and the retry-after header before you change any logic.
Step 2: Retry with exponential backoff
On a 429, wait and try again, doubling the delay each time with a little randomness. This clears short bursts without hammering the API.
async function withRetry(fn, max = 5) {
for (let i = 0; i < max; i++) {
try { return await fn(); }
catch (e) {
if (e.status !== 429 || i === max - 1) throw e;
const wait = (2 ** i) * 500 + Math.random() * 250;
await new Promise(r => setTimeout(r, wait));
}
}
}Step 3: Cap concurrency
Firing a hundred calls at once guarantees a 429. Limit how many run in parallel so you stay under the per-minute ceiling.
Step 4: Batch where the API allows it
Result: transient limits are absorbed by backoff, sustained load stays under the ceiling via concurrency caps, and 429 errors stop reaching your users.
Watch related tutorials
1:42:18
28:14
41:09
9:47
8:23
52:31