Skip to main content

Rate Limits

Understand usage limits and how to optimize for high-volume applications.

ModelPilot implements token-based rate limiting to ensure fair usage and system stability. Limits are measured in tokens per time period rather than requests per minute.

Rate Limit Tiers

Default Tier
Free and Starter plans
Free

Per Minute

200K

tokens/min

Per Hour

2M

tokens/hour

Per Day

10M

tokens/day

Pro Tier
Professional and growing teams
Pro

Per Minute

5M

tokens/min

Per Hour

50M

tokens/hour

Per Day

250M

tokens/day

Trusted Tier
High-volume production applications
Business

Per Minute

2M

tokens/min

Per Hour

20M

tokens/hour

Per Day

100M

tokens/day

Enterprise Tier
Custom limits for large-scale deployments
Enterprise

Per Minute

5M+

tokens/min

Per Hour

50M+

tokens/hour

Per Day

200M+

tokens/day

How Rate Limiting Works

Token-Based Limiting

Unlike traditional request-per-minute limits, ModelPilot counts tokens (both input and output) to measure usage. This provides:

  • Fairer usage - Small requests don't count the same as large ones
  • Better resource allocation - Limits match actual compute usage
  • Predictable costs - Token limits align with billing
Sliding Window

Rate limits use a sliding window algorithm. Your usage is calculated based on the last N minutes/hours/days, not calendar boundaries. This provides smoother rate limiting without sudden resets.

Rate Limit Headers

Response Headers
Every API response includes rate limit information
text
X-RateLimit-Limit-Minute: 200000
X-RateLimit-Remaining-Minute: 198543
X-RateLimit-Reset-Minute: 1672531200

X-RateLimit-Limit-Hour: 2000000
X-RateLimit-Remaining-Hour: 1876234
X-RateLimit-Reset-Hour: 1672534800

X-RateLimit-Limit-Day: 10000000
X-RateLimit-Remaining-Day: 8234567
X-RateLimit-Reset-Day: 1672617600

Handling Rate Limits

429 Too Many Requests

When you exceed your rate limit, you'll receive a 429 status code:

json
{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "retry_after": 42
  }
}

The retry_after field indicates how many seconds to wait before retrying.

Exponential Backoff
Recommended retry strategy
javascript
async function makeRequestWithRetry(
  requestFn,
  maxRetries = 3
) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await requestFn();
    } catch (error) {
      if (error.status === 429) {
        const retryAfter = error.retry_after || Math.pow(2, i);
        await new Promise(resolve => 
          setTimeout(resolve, retryAfter * 1000)
        );
        continue;
      }
      throw error;
    }
  }
  throw new Error('Max retries exceeded');
}

Best Practices

Monitor Usage
  • • Check rate limit headers on every response
  • • Set up alerts when approaching limits
  • • Track usage patterns over time
  • • Plan capacity for peak usage
Optimize Requests
  • • Batch requests when possible
  • • Use caching for repeated requests
  • • Reduce token usage with concise prompts
  • • Implement request queuing

Need Higher Limits?

Upgrade Your Plan

Get higher rate limits with Pro, Business, or Enterprise plans

Next Steps