Rate Limits

LLMGateway implements rate limits to ensure fair usage and optimal performance for all users. The rate limits differ based on your account status and the type of models you're using.

Free Models

Free models (models with zero input and output pricing) have rate limits that depend on your account's credit status:

Base Rate Limits

For organizations with zero credits:

5 requests per 10 minutes
Applies to all free model requests
Resets every 10 minutes

Elevated Rate Limits

For organizations that have purchased at least some credits:

20 requests per minute
Applies to all free model requests
Resets every minute

When using free models with elevated limits, your credits will not be deducted. The elevated rate limits are simply a benefit for users who have added credits to their account.

Paid Models

Paid AI models are not currently rate limited. You can make as many requests as needed to paid models, subject only to your account's credit balance and any provider-specific limits.

Rate Limit Headers

All API responses include rate limit information in the headers:

X-RateLimit-Limit: 20
X-RateLimit-Remaining: 19
X-RateLimit-Reset: 1640995200

X-RateLimit-Limit: Maximum number of requests allowed in the current window
X-RateLimit-Remaining: Number of requests remaining in the current window
X-RateLimit-Reset: Unix timestamp when the rate limit window resets

Rate Limit Exceeded

When you exceed your rate limit, you'll receive a 429 Too Many Requests response:

{
	"error": {
		"message": "Rate limit exceeded. Try again later.",
		"type": "rate_limit_error",
		"code": "rate_limit_exceeded"
	}
}

Best Practices

Upgrading Your Limits

To unlock elevated rate limits for free models:

Add credits to your account through the dashboard
Your rate limits will automatically increase to 20 requests per minute
Free model usage will still not deduct from your credits

Handling Rate Limits

Implement exponential backoff when you receive 429 responses
Monitor the X-RateLimit-Remaining header to avoid hitting limits
Consider using paid models for high-volume applications

Cost Optimization

Use free models for development and testing
Switch to paid models for production workloads requiring higher throughput
Monitor your usage patterns through the dashboard

Adding even a small amount of credits to your account (e.g., $5) will immediately upgrade your free model rate limits from 5 requests per 10 minutes to 20 requests per minute.

Rate Limits

On this page