Rate Limits
Understand usage limits and how to optimize for high-volume applications.
ModelPilot implements token-based rate limiting to ensure fair usage and system stability. Limits are measured in tokens per time period rather than requests per minute.
Rate Limit Tiers
Per Minute
200K
tokens/min
Per Hour
2M
tokens/hour
Per Day
10M
tokens/day
Per Minute
5M
tokens/min
Per Hour
50M
tokens/hour
Per Day
250M
tokens/day
Per Minute
2M
tokens/min
Per Hour
20M
tokens/hour
Per Day
100M
tokens/day
Per Minute
5M+
tokens/min
Per Hour
50M+
tokens/hour
Per Day
200M+
tokens/day
How Rate Limiting Works
Unlike traditional request-per-minute limits, ModelPilot counts tokens (both input and output) to measure usage. This provides:
- Fairer usage - Small requests don't count the same as large ones
- Better resource allocation - Limits match actual compute usage
- Predictable costs - Token limits align with billing
Rate limits use a sliding window algorithm. Your usage is calculated based on the last N minutes/hours/days, not calendar boundaries. This provides smoother rate limiting without sudden resets.
Rate Limit Headers
X-RateLimit-Limit-Minute: 200000
X-RateLimit-Remaining-Minute: 198543
X-RateLimit-Reset-Minute: 1672531200
X-RateLimit-Limit-Hour: 2000000
X-RateLimit-Remaining-Hour: 1876234
X-RateLimit-Reset-Hour: 1672534800
X-RateLimit-Limit-Day: 10000000
X-RateLimit-Remaining-Day: 8234567
X-RateLimit-Reset-Day: 1672617600Handling Rate Limits
When you exceed your rate limit, you'll receive a 429 status code:
{
"error": {
"message": "Rate limit exceeded",
"type": "rate_limit_error",
"code": "rate_limit_exceeded",
"retry_after": 42
}
}The retry_after field indicates how many seconds to wait before retrying.
async function makeRequestWithRetry(
requestFn,
maxRetries = 3
) {
for (let i = 0; i < maxRetries; i++) {
try {
return await requestFn();
} catch (error) {
if (error.status === 429) {
const retryAfter = error.retry_after || Math.pow(2, i);
await new Promise(resolve =>
setTimeout(resolve, retryAfter * 1000)
);
continue;
}
throw error;
}
}
throw new Error('Max retries exceeded');
}Best Practices
- • Check rate limit headers on every response
- • Set up alerts when approaching limits
- • Track usage patterns over time
- • Plan capacity for peak usage
- • Batch requests when possible
- • Use caching for repeated requests
- • Reduce token usage with concise prompts
- • Implement request queuing
Need Higher Limits?
Upgrade Your Plan
Get higher rate limits with Pro, Business, or Enterprise plans