LLM Gateway

Routing

Learn how LLMGateway intelligently routes your requests to the best available models and providers.

Routing

LLMGateway provides flexible and intelligent routing options to help you get the best performance and cost efficiency from your AI applications. Whether you want to use specific models, providers, or let our system automatically optimize your requests, we've got you covered.

Model Selection

Any Model Name

You can use any model name from our models page or discover available models programmatically through the /v1/models endpoint.

curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Model ID Routing

Choose a specific model ID to route to the best available provider for that model. LLMGateway's smart routing algorithm considers multiple factors to find the optimal provider across all configured options.

Smart Routing Algorithm

When you use a model ID without a provider prefix, LLMGateway's intelligent routing system analyzes multiple factors to select the best provider:

Weighted Scoring System (based on last 5 minutes of metrics):

  • Uptime (50%) - Prioritizes providers with high reliability and low error rates
  • Latency (30%) - Favors providers with faster response times
  • Price (20%) - Considers cost efficiency while maintaining quality

The algorithm calculates a weighted score for each available provider and selects the one with the lowest (best) score. All metrics are normalized to ensure fair comparison across providers.

Epsilon-Greedy Exploration (1% of requests):

To solve the "cold start problem" where new or unused providers never get traffic to build up metrics, the system randomly explores different providers 1% of the time. This ensures:

  • All providers periodically receive traffic
  • New providers can prove their reliability
  • The system adapts to changing provider performance
  • You benefit from improved routing decisions over time

Routing Metadata:

Every request includes detailed routing metadata in the logs, showing:

  • Available providers that were considered
  • Selected provider and selection reason
  • Scores for each provider (including uptime, latency, and price)

This transparency allows you to understand and debug routing decisions.

Using model IDs without a provider prefix automatically routes to the optimal provider based on reliability, speed, and cost. The system continuously learns and adapts based on real-time performance metrics.

Smart routing prioritizes reliability over cost, ensuring your requests are routed to providers with proven uptime and performance, while still considering cost efficiency.

Provider-Specific Routing

To use a specific provider without any fallbacks, prefix the model name with the provider name followed by a slash:

# Use OpenAI specifically
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Use the CloudRift provider specifically
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cloudrift/deepseek-v3",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Note that when specifying a provider explicitly, LLMGateway will not perform any routing and will send the request directly to the chosen provider. If that provider is unavailable or has issues, the request will fail.

Optimized Auto Routing

Auto routing automatically selects the best model for your specific use case without you having to specify a model at all.

Current Implementation

The auto routing system currently:

  • Chooses cost-effective models by default for optimal price-to-performance ratio
  • Automatically scales to more powerful models based on your request's context size
  • Handles large contexts intelligently by selecting models with appropriate context windows
# Let LLMGateway choose the optimal model
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Your request here..."}]
  }'

Free Models Only

When using auto routing, you can restrict the selection to only free models (models with zero input and output pricing) by setting the free_models_only parameter to true:

# Auto route to free models only
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello!"}],
    "free_models_only": true
  }'

Adding even a small amount of credits to your account (e.g., $5) will immediately upgrade your free model rate limits from 5 requests per 10 minutes to 20 requests per minute.

The free_models_only parameter only works with auto routing ("model": "auto"). If no free models are available that meet your request requirements, the API will return an error.

Reasoning models only

Just specify the reasoning_effort value and only a model which supports reasoning will be chosen. This parameter is not specific to the auto model.

# Auto route only to reasoning models
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello!"}],
    "reasoning_effort": "medium"
  }'

Exclude Reasoning Models

When using auto routing, you can exclude reasoning models from selection by setting the no_reasoning parameter to true. This is useful when you want faster responses or need to avoid the additional cost and latency of reasoning models:

# Auto route excluding reasoning models
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello!"}],
    "no_reasoning": true
  }'

The no_reasoning parameter only works with auto routing ("model": "auto"). If no non-reasoning models are available that meet your request requirements, the API will return an error.

Auto routing analyzes your payload and automatically chooses between cost-effective models for simple requests and more powerful models for complex or large-context requests.

Coming Soon: Advanced Optimization

We're continuously improving our auto routing capabilities. Soon you'll benefit from:

  • Tool call optimization: Automatically select models that excel at function calling and structured outputs
  • Content-aware routing: Analyze message content to determine the best model for specific types of requests (coding, creative writing, analysis, etc.)
  • Performance-based routing: Route based on historical performance data for similar requests
  • Multi-model orchestration: Intelligently combine multiple models for complex workflows

How It Works

  1. Request Analysis: The system analyzes your request including message content, context size, and any special parameters
  2. Model Selection: Based on the analysis, it selects the most appropriate model considering cost, performance, and capabilities
  3. Transparent Routing: Your request is seamlessly routed to the chosen model and provider
  4. Optimized Response: You receive the best possible response while maintaining cost efficiency

Auto routing decisions are transparent in your usage logs, so you can always see which model was selected for each request.

Best Practices

For Development

  • Use specific model names during development and testing
  • Leverage auto routing for production workloads to optimize costs

For Production

  • Use auto routing ("model": "auto") for the best balance of cost and performance
  • Monitor your usage patterns through the dashboard to understand routing decisions
  • Set up provider keys for multiple providers to maximize routing options

For Cost Optimization

  • Let auto routing handle model selection to automatically use the most cost-effective options
  • Use model IDs without provider prefixes to always get the cheapest available provider
  • Monitor your usage analytics to track cost savings from intelligent routing

How is this guide?

Last updated on