# Introduction
URL: /
import { FeatureCards } from "@/components/feature-cards";
import { AIToolingCards } from "@/components/ai-tooling-cards";
LLM Gateway is an open-source API gateway that sits between your applications and LLM providers like OpenAI, Anthropic, Google AI Studio, and more. It provides a unified, OpenAI-compatible API interface with built-in cost tracking, caching, and intelligent routing.
## Features
## AI Tooling
LLM Gateway is built to work seamlessly with AI agents and development tools.
## Next Steps
* [**Quickstart**](/quick-start) β Get up and running in minutes
* [**Overview**](/overview) β Learn more about what LLM Gateway offers
* [**Self-Host**](/self-host) β Deploy on your own infrastructure
# Overview
URL: /overview
# LLM Gateway
LLM Gateway is an open-source API gateway for Large Language Models (LLMs). It acts as a middleware between your applications and various LLM providers, allowing you to:
* Route requests to multiple LLM providers (OpenAI, Anthropic, Google AI Studio, and others)
* Manage API keys for different providers in one place
* Track token usage and costs across all your LLM interactions
* Analyze performance metrics to optimize your LLM usage
## Analyzing Your LLM Requests
LLM Gateway provides detailed insights into your LLM usage:
* **Usage Metrics**: Track the number of requests, tokens used, and response times
* **Cost Analysis**: Monitor spending across different models and providers
* **Performance Tracking**: Identify patterns and optimize your prompts based on actual usage data
* **Breakdown by Model**: Compare different models' performance and cost-effectiveness
All this data is automatically collected and presented in an intuitive dashboard, helping you make informed decisions about your LLM strategy.
## Getting Started
Using LLM Gateway is simple. Just swap out your current LLM provider URL with the LLM Gateway API endpoint:
```bash
curl -X POST https://api.llmgateway.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'
```
LLM Gateway maintains compatibility with the OpenAI API format, making migration seamless.
## Hosted vs. Self-Hosted
You can use LLM Gateway in two ways:
* **Hosted Version**: For immediate use without setup, visit [llmgateway.io](https://llmgateway.io) to create an account and get an API key.
* **Self-Hosted**: Deploy LLM Gateway on your own infrastructure for complete control over your data and configuration.
The self-hosted version offers additional customization options and ensures your LLM traffic never leaves your infrastructure if desired.
# Quickstart
URL: /quick-start
import { Accordion, Accordions } from "fumadocs-ui/components/accordion";
import { Tabs, Tab } from "fumadocs-ui/components/tabs";
import { DynamicCodeBlock } from "fumadocs-ui/components/dynamic-codeblock";
# π Quickstart
Welcome to **LLMΒ Gateway**βa single dropβin endpoint that lets you call todayβs best largeβlanguage models while keeping **your existing code** and development workflow intact.
> **TL;DR**Β β Point your HTTP requests to `https://api.llmgateway.io/v1/β¦`, supply your `LLM_GATEWAY_API_KEY`, and youβre done.
***
## 1Β Β·Β Get an API key
1. Sign in to the dashboard.
2. Create a new Project β *Copy the key*.
3. Export it in your shell (or a `.env` file):
```bash
export LLM_GATEWAY_API_KEY="llmgtwy_XXXXXXXXXXXXXXXX"
```
***
## 2 Β· Pick your language
{
setLoading(true);
try {
const res = await fetch('https://api.llmgateway.io/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': \`Bearer \${process.env.REACT_APP_LLM_GATEWAY_API_KEY}\`
},
body: JSON.stringify({
model: 'gpt-4o',
messages: [
{ role: 'user', content: 'Hello, how are you?' }
]
})
});
if (!res.ok) {
throw new Error(\`HTTP error! status: \${res.status}\`);
}
const data = await res.json();
setResponse(data.choices[0].message.content);
} catch (error) {
console.error('Error:', error);
} finally {
setLoading(false);
}
};
return (
{response &&
{response}
}
); }
export default ChatComponent;
`}
/>
response = HttpClient.newHttpClient()
.send(request, HttpResponse.BodyHandlers.ofString());
System.out.println(response.body());`}
/>
Result<(), Box> {
let client = Client::new();
let api_key = env::var("LLM_GATEWAY_API_KEY")?;
let response = client
.post("https://api.llmgateway.io/v1/chat/completions")
.header("Content-Type", "application/json")
.header("Authorization", format!("Bearer {}", api_key))
.json(&json!({
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}))
.send()
.await?;
let result: serde_json::Value = response.json().await?;
println!("{}", result["choices"][0]["message"]["content"]);
Ok(())
}`}
/>
'gpt-4o',
'messages' => [
['role' => 'user', 'content' => 'Hello, how are you?']
]
];
$options = [
'http' => [
'header' => [
'Content-Type: application/json',
'Authorization: Bearer ' . $apiKey
],
'method' => 'POST',
'content' => json_encode($data)
]
];
$context = stream_context_create($options);
$response = file_get_contents(
'https://api.llmgateway.io/v1/chat/completions',
false,
$context
);
if ($response === FALSE) {
throw new Exception('Request failed');
}
$result = json_decode($response, true);
echo $result['choices'][0]['message']['content'];
?>`}
/>
***
## 3 Β· SDK integrations
```ts title="ai-sdk.ts"
import { llmgateway } from "@llmgateway/ai-sdk-provider";
import { generateText } from "ai";
const { text } = await generateText({
model: llmgateway("gpt-4o"),
prompt: "Write a vegetarian lasagna recipe for 4 people.",
});
```
```ts title="vercel-ai-sdk.ts"
import { createOpenAI } from "@ai-sdk/openai";
const llmgateway = createOpenAI({
baseURL: "https://api.llmgateway.io/v1",
apiKey: process.env.LLM_GATEWAY_API_KEY!,
});
const completion = await llmgateway.chat({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello, how are you?" }],
});
console.log(completion.choices[0].message.content);
```
```ts title="openai-sdk.ts"
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "https://api.llmgateway.io/v1",
apiKey: process.env.LLM_GATEWAY_API_KEY,
});
const completion = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello, how are you?" }],
});
console.log(completion.choices[0].message.content);
```
***
## 4Β Β·Β Going further
* **Streaming**: pass `stream: true` to any requestβGateway will proxy the event stream unchanged.
* **Monitoring**: Every call appears in the dashboard with latency, cost & provider breakdown.
***
## 5Β Β·Β FAQ
See the [Models page](https://llmgateway.io/models).
Unlike OpenRouter, we offer:
Full self-hosting capabilities, giving you complete control over your
infrastructure
Enhanced analytics with deeper insights into your model usage and
performance
No fees when using your own provider keys, maximizing cost efficiency
Greater flexibility and customization options for enterprise deployments
Our pricing structure is designed to be flexible and cost-effective: See the
[Pricing section](https://llmgateway.io#pricing).
***
## 6Β Β·Β Next steps
* Read [Self host docs](/self-host) guide.
* Drop into our [GitHub](https://github.com/theopenco/llmgateway) for help or feature requests.
Happy building! β¨
# Self Host LLMGateway
URL: /self-host
# Self Host LLMGateway
LLMGateway is a self-hostable platform that provides a unified API gateway for multiple LLM providers. This guide offers two simple options to get started.
## Prerequisites
* Latest Docker
* API keys for the LLM providers you want to use (OpenAI, Anthropic, etc.)
## Option 1: Unified Docker Image (Simplest)
This option uses a single Docker container that includes all services (UI, API, Gateway, Database, Redis).
```bash
# Run the container
docker run -d \
--name llmgateway \
--restart unless-stopped \
-p 3002:3002 \
-p 3003:3003 \
-p 3005:3005 \
-p 3006:3006 \
-p 4001:4001 \
-p 4002:4002 \
-v ~/llmgateway_data:/var/lib/postgresql/data \
-e AUTH_SECRET=your-secret-key-here \
ghcr.io/theopenco/llmgateway-unified:latest
```
Note: it is recommended to use the latest version tag from here instead of `latest`: [https://github.com/theopenco/llmgateway/releases](https://github.com/theopenco/llmgateway/releases)
### Using Docker Compose (Alternative for unified image)
```bash
# Download the compose file
curl -O https://raw.githubusercontent.com/theopenco/llmgateway/main/infra/docker-compose.unified.yml
curl -O https://raw.githubusercontent.com/theopenco/llmgateway/main/.env.example
# Configure environment
cp .env.example .env
# Edit .env with your configuration
# Start the service
docker compose -f docker-compose.unified.yml up -d
```
Note: it is recommended to replace the `latest` version tag in the image with the latest version from here: [https://github.com/theopenco/llmgateway/releases](https://github.com/theopenco/llmgateway/releases)
## Option 2: Separate Services with Docker Compose
This option uses separate containers for each service, offering more flexibility.
```bash
# Clone the repository
git clone https://github.com/theopenco/llmgateway.git
cd llmgateway
# Configure environment
cp .env.example .env
# Edit .env with your configuration
# Start the services
docker compose -f infra/docker-compose.split.yml up -d
```
Note: it is recommended to replace the `latest` version tag in all images in the compose file with the latest version from here: [https://github.com/theopenco/llmgateway/releases](https://github.com/theopenco/llmgateway/releases)
## Accessing Your LLMGateway
After starting either option, you can access:
* **Web Interface**: [http://localhost:3002](http://localhost:3002)
* **Documentation**: [http://localhost:3005](http://localhost:3005)
* **API Endpoint**: [http://localhost:4002](http://localhost:4002)
* **Gateway Endpoint**: [http://localhost:4001](http://localhost:4001)
## Required Configuration
At minimum, you need to set these environment variables:
```bash
# Database (change the password!)
POSTGRES_PASSWORD=your_secure_password_here
# Authentication
AUTH_SECRET=your-secret-key-here
# LLM Provider API Keys (add the ones you need)
LLM_OPENAI_API_KEY=sk-...
LLM_ANTHROPIC_API_KEY=sk-ant-...
```
## Basic Management Commands
### For Unified Docker (Option 1)
```bash
# View logs
docker logs llmgateway
# Restart container
docker restart llmgateway
# Stop container
docker stop llmgateway
```
### For Docker Compose (Option 2)
```bash
# View logs
docker compose -f infra/docker-compose.split.yml logs -f
# Restart services
docker compose -f infra/docker-compose.split.yml restart
# Stop services
docker compose -f infra/docker-compose.split.yml down
```
## Build locally
To build locally, you can use the \*.local.yml compose file in the `infra` directory, which will build the images from the source code.
## All provider API keys
You can set any of the following API keys:
```text
LLM_OPENAI_API_KEY=
LLM_ANTHROPIC_API_KEY=
```
## Multiple API Keys and Load Balancing
LLMGateway supports multiple API keys per provider for load balancing and increased availability. Simply provide comma-separated values for your API keys:
```bash
# Multiple OpenAI keys for load balancing
LLM_OPENAI_API_KEY=sk-key1,sk-key2,sk-key3
# Multiple Anthropic keys
LLM_ANTHROPIC_API_KEY=sk-ant-key1,sk-ant-key2
```
### Health-Aware Routing
The gateway automatically tracks the health of each API key and routes requests to healthy keys. If a key experiences consecutive errors, it will be temporarily skipped. Keys that return authentication errors (401/403) are permanently blacklisted until restart.
### Related Configuration Values
For providers that require additional configuration (like Google Vertex), you can specify multiple values that correspond to each API key. The gateway will always use the matching index:
```bash
# Multiple Google Vertex configurations
LLM_GOOGLE_VERTEX_API_KEY=key1,key2,key3
LLM_GOOGLE_CLOUD_PROJECT=project-a,project-b,project-c
LLM_GOOGLE_VERTEX_REGION=us-central1,europe-west1,asia-east1
```
When the gateway selects `key2`, it will automatically use `project-b` and `europe-west1`. If you have fewer configuration values than keys, the last value will be reused for remaining keys.
## Next Steps
Once your LLMGateway is running:
1. **Open the web interface** at [http://localhost:3002](http://localhost:3002)
2. **Create your first organization** and project
3. **Generate API keys** for your applications
4. **Test the gateway** by making API calls to [http://localhost:4001](http://localhost:4001)
# Health check
URL: /health
{/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */}
Health check endpoint.
# Chat Completions
URL: /v1_chat_completions
{/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */}
Create a completion for the chat conversation
# Anthropic Messages
URL: /v1_messages
{/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */}
Create a message using Anthropic's API format
# Models
URL: /v1_models
{/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */}
List all available models
# Agent Skills
URL: /guides/agent-skills
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
**Agent Skills** are structured guidelines for AI coding agents, optimized for use with LLM Gateway and the AI SDK. They provide best practices and reusable instructions that help AI agents generate higher-quality code.
## What Are Agent Skills?
Agent Skills are packaged sets of rules and guidelines that teach AI coding agents how to implement specific features correctly. Each skill covers:
* API integration patterns
* Frontend rendering best practices
* Error handling strategies
* Performance optimization techniques
## Available Skills
### Image Generation
The Image Generation skill teaches AI agents how to properly implement image generation features:
* **API Integration** β correctly calling image generation APIs
* **Frontend Rendering** β displaying generated images efficiently
* **Error Handling** β graceful degradation and retry logic
* **Performance** β caching, lazy loading, and optimization
## Installation
### Prerequisites
Ensure you have Node.js 18+ and pnpm 9+ installed:
```bash
node --version # v18.0.0 or higher
pnpm --version # 9.0.0 or higher
```
### Clone the Repository
```bash
git clone https://github.com/theopenco/agent-skills.git
cd agent-skills
```
### Install Dependencies
```bash
pnpm install
```
### Build Skills
Build all skills to generate the documentation:
```bash
pnpm build:all
```
Or build a specific skill:
```bash
pnpm build
```
## Using Skills in Your Project
After building, each skill generates an `AGENTS.md` file that can be used with AI coding agents like Claude, Cursor, or Copilot.
### With Claude Code
Add the generated `AGENTS.md` content to your project's `CLAUDE.md` file:
```bash
cat skills/image-generation/AGENTS.md >> CLAUDE.md
```
### With Cursor
Add the skill content to your `.cursorrules` file:
```bash
cat skills/image-generation/AGENTS.md >> .cursorrules
```
### With Other AI Agents
Most AI coding tools support custom instructions. Copy the skill content into your tool's configuration.
## Project Structure
```
agent-skills/
βββ packages/
β βββ skills-build/ # Build tooling
βββ skills/
β βββ image-generation/ # Individual skill
β βββ rules/ # Rule files
β βββ AGENTS.md # Generated documentation
β βββ metadata.json # Skill metadata
βββ package.json
```
## Contributing
### Adding New Rules
### Fork and Clone
Fork the repository and create a feature branch:
```bash
git checkout -b feat/new-rule
```
### Create a Rule File
Rules follow a standardized template with YAML frontmatter containing `title`, `impact` (high/medium/low), and `tags`. The body includes sections for Context, Incorrect examples, and Correct examples with TypeScript code blocks.
See existing rules in `skills/image-generation/rules/` for reference.
### Validate and Build
```bash
pnpm validate
pnpm build:all
```
### Submit a Pull Request
Push your changes and open a PR.
### Impact Levels
When creating rules, use these impact levels:
* **high** β Critical for correctness or security
* **medium** β Important for quality and maintainability
* **low** β Nice-to-have improvements
## Development Commands
| Command | Description |
| ---------------- | --------------------------- |
| `pnpm install` | Install dependencies |
| `pnpm build:all` | Build all skills |
| `pnpm build` | Build a specific skill |
| `pnpm validate` | Validate rule files |
| `pnpm dev` | Development mode with watch |
## More Resources
* [GitHub Repository](https://github.com/theopenco/agent-skills) β Source code and contributions
* [LLM Gateway CLI](/guides/cli) β Project scaffolding tool
* [Templates](https://llmgateway.io/templates) β Production-ready starter projects
Want to contribute a new skill or rule? Check out the [contribution
guidelines](https://github.com/theopenco/agent-skills#contributing) on GitHub.
# Autohand Integration
URL: /guides/autohand
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
Autohand is an autonomous AI coding agent that works in your terminal, IDE, and Slack. With LLM Gateway, you can route all Autohand requests through a single gatewayβuse any of 180+ models from 60+ providers, with full cost tracking and smart routing.
## Setup
### Sign Up for LLM Gateway
[Sign up free](https://llmgateway.io/signup) β no credit card required. Copy your API key from the dashboard.
### Set Environment Variables
Configure Autohand to use LLM Gateway:
```bash
export OPENAI_BASE_URL=https://api.llmgateway.io/v1
export OPENAI_API_KEY=llmgtwy_your_api_key_here
```
### Run Autohand
```bash
autohand
```
All requests will now be routed through LLM Gateway.
## Why Use LLM Gateway with Autohand
* **180+ models** β GPT-5, Claude Opus, Gemini, Llama, and more from 60+ providers
* **Smart routing** β Automatically selects the best provider based on uptime, throughput, price, and latency
* **Cost tracking** β Monitor exactly how much each autonomous session costs
* **Single bill** β No need to manage multiple API provider accounts
* **Response caching** β Repeated requests hit cache automatically
* **Automatic failover** β If one provider is down, requests route to another
## Configuration File
You can also configure LLM Gateway in Autohand's config file:
```json
{
"provider": {
"llmgateway": {
"baseUrl": "https://api.llmgateway.io/v1",
"apiKey": "llmgtwy_your_api_key_here"
}
},
"model": "gpt-5"
}
```
## Choosing Models
You can use any model from the [models page](https://llmgateway.io/models).
| Model | Best For |
| ------------------- | ------------------------------------------- |
| `gpt-5` | Latest OpenAI flagship, highest quality |
| `claude-opus-4-6` | Anthropic's most capable model |
| `claude-sonnet-4-6` | Fast reasoning with extended thinking |
| `gemini-2.5-pro` | Google's latest flagship, 1M context window |
| `o3` | Advanced reasoning tasks |
| `gpt-5-mini` | Cost-effective, quick responses |
| `gemini-2.5-flash` | Fast responses, good for high-volume |
| `deepseek-v3.1` | Open-source with vision and tools |
## Autohand Features with LLM Gateway
### Terminal (CLI)
Autohand CLI works seamlessly with LLM Gateway. Set the environment variables and use all Autohand commands as normalβmulti-file editing, agentic search, and autonomous code generation all work out of the box.
### IDE Integration
Autohand's VS Code and Zed extensions respect the same environment variables. Set them in your shell profile and the IDE integration will automatically route through LLM Gateway.
### Slack Integration
When using Autohand through Slack, configure the LLM Gateway base URL in your Autohand server settings to route all Slack-triggered coding tasks through the gateway.
## Monitoring Usage
Once configured, all Autohand requests appear in your LLM Gateway dashboard:
* **Request logs** β See every prompt and response
* **Cost breakdown** β Track spending by model and time period
* **Usage analytics** β Understand your AI usage patterns
View all available models on the [models page](https://llmgateway.io/models).
Need help? Join our [Discord community](https://llmgateway.io/discord) for
support and troubleshooting assistance.
# Claude Code Integration
URL: /guides/claude-code
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
Claude Code is locked to Anthropic's API by default. With LLM Gateway, you can point it at any modelβGPT-5, Gemini, Llama, or 180+ othersβwhile keeping the same Anthropic API format Claude Code expects.
Three environment variables. No code changes. Full cost tracking in your dashboard.
## Setup
### Sign Up for LLM Gateway
[Sign up free](https://llmgateway.io/signup) β no credit card required. Copy your API key from the dashboard.
### Set Environment Variables
Configure Claude Code to use LLM Gateway:
```bash
export ANTHROPIC_BASE_URL=https://api.llmgateway.io
export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here
# optional: specify a model, otherwise it uses the default Claude model
export ANTHROPIC_MODEL=gpt-5 # or any model from our catalog
```
### Run Claude Code
```bash
claude
```
All requests will now be routed through LLM Gateway.
## Why This Works
LLM Gateway's `/v1/messages` endpoint speaks Anthropic's API format natively. We handle the translation to each provider behind the scenes. This means:
* **Use any model** β GPT-5, Gemini, Llama, or Claude itself
* **Keep your workflow** β Claude Code doesn't know the difference
* **Track costs** β Every request appears in your LLM Gateway dashboard
* **Automatic caching** β Repeated requests hit cache, saving money
## Choosing Models
You can use any model from the [models page](https://llmgateway.io/models).
### Use OpenAI's Latest Models
```bash
# Use the latest GPT model
export ANTHROPIC_MODEL=gpt-5
# Use a cost-effective alternative
export ANTHROPIC_MODEL=gpt-5-mini
```
### Use Google's Gemini
```bash
export ANTHROPIC_MODEL=gemini-2.5-pro
```
### Use Anthropic's Claude Models
```bash
export ANTHROPIC_MODEL=anthropic/claude-3-5-sonnet-20241022
```
## Environment Variables
### ANTHROPIC\_MODEL
Specifies the main model to use for primary requests.
```bash
export ANTHROPIC_MODEL=gpt-5
```
### Complete Configuration Example
```bash
export ANTHROPIC_BASE_URL=https://api.llmgateway.io
export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here
export ANTHROPIC_MODEL=gpt-5
export ANTHROPIC_SMALL_FAST_MODEL=gpt-5-nano
```
## Making Manual API Requests
If you want to test the endpoint directly, you can make manual requests:
```bash
curl -X POST "https://api.llmgateway.io/v1/messages" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"max_tokens": 100
}'
```
### Response Format
The endpoint returns responses in Anthropic's message format:
```json
{
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"model": "gpt-5",
"content": [
{
"type": "text",
"text": "Hello! I'm doing well, thank you for asking. How can I help you today?"
}
],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 13,
"output_tokens": 20
}
}
```
## What You Get
* **Any model in Claude Code** β GPT-5 for heavy lifting, GPT-4o Mini for routine tasks
* **Cost visibility** β See exactly what each coding session costs
* **One bill** β Stop managing separate accounts for OpenAI, Anthropic, Google
* **Response caching** β Repeated requests (like linting the same file) hit cache
* **Discounts** β Check [discounted models](https://llmgateway.io/models?discounted=true) for savings up to 90%
View all available models on the [models page](https://llmgateway.io/models).
Need help? Join our [Discord community](https://llmgateway.io/discord) for
support and troubleshooting assistance.
# LLM Gateway CLI
URL: /guides/cli
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
import { Tabs, Tab } from "fumadocs-ui/components/tabs";
The **LLM Gateway CLI** (`@llmgateway/cli`) is a command-line utility for scaffolding projects, managing AI applications, and discovering models.
## Installation
Run commands directly without installation:
```bash
npx @llmgateway/cli init
```
Install globally for faster access:
```bash
npm install -g @llmgateway/cli
```
Then run commands directly:
```bash
llmgateway init
```
## Quick Start
### Initialize a Project
Create a new project from a template:
```bash
npx @llmgateway/cli init
```
Or specify the template and name directly:
```bash
npx @llmgateway/cli init --template image-generation --name my-ai-app
```
### Configure Authentication
Login to save your API key locally:
```bash
npx @llmgateway/cli auth login
```
This opens a browser window to authenticate with LLM Gateway. Your credentials are stored in `~/.llmgateway/config.json`.
Alternatively, set the `LLMGATEWAY_API_KEY` environment variable which takes precedence over the config file.
### Start Development
Navigate to your project and start the development server:
```bash
cd my-ai-app
npx @llmgateway/cli dev
```
Or specify a custom port:
```bash
npx @llmgateway/cli dev --port 3000
```
## Commands
### `init`
Initialize a new project from a template.
```bash
npx @llmgateway/cli init [options]
```
**Options:**
* `--template ` β Template to use (e.g., `image-generation`, `weather-agent`)
* `--name ` β Project name
**Examples:**
```bash
# Interactive mode
npx @llmgateway/cli init
# With options
npx @llmgateway/cli init --template image-generation --name my-app
```
### `list`
Display available project templates.
```bash
npx @llmgateway/cli list
```
**Options:**
* `--json` β Output in JSON format
### `models`
Browse and filter available AI models.
```bash
npx @llmgateway/cli models [options]
```
**Options:**
* `--capability ` β Filter by capability (e.g., `chat`, `image`, `embedding`)
* `--provider ` β Filter by provider (e.g., `openai`, `anthropic`, `google`)
* `--search ` β Search models by name
**Examples:**
```bash
# List all models
npx @llmgateway/cli models
# Filter by provider
npx @llmgateway/cli models --provider openai
# Search models
npx @llmgateway/cli models --search gpt
```
### `add`
Add tools or API routes to an existing project.
```bash
npx @llmgateway/cli add
```
**Tools available:**
* `weather` β Weather lookup functionality
* `search` β Web search capability
* `calculator` β Mathematical operations
**API routes available:**
* `generate` β Text generation endpoint
* `chat` β Chat completion endpoint
### `auth`
Manage API authentication.
```bash
# Login via browser
npx @llmgateway/cli auth login
# Check authentication status
npx @llmgateway/cli auth status
# Logout
npx @llmgateway/cli auth logout
```
### `dev`
Start the local development server.
```bash
npx @llmgateway/cli dev [options]
```
**Options:**
* `--port ` β Port to run on (default: 3000)
### `upgrade`
Update LLM Gateway dependencies in your project.
```bash
npx @llmgateway/cli upgrade [options]
```
**Options:**
* `--dry-run` β Show what would be updated without making changes
### `docs`
Open the documentation in your browser.
```bash
npx @llmgateway/cli docs
```
## Available Templates
### Image Generation
A full-stack application for AI image generation.
* **Stack:** Next.js 16, React 19, TypeScript
* **Features:** Multi-provider support (DALL-E, Stable Diffusion), unified API
* **Use case:** Image generation apps, creative tools
```bash
npx @llmgateway/cli init --template image-generation
```
### QA Agent
An AI-powered QA testing agent that uses browser automation to test your web app.
* **Stack:** Next.js 16, React 19, TypeScript, Agent Browser
* **Features:** Natural language testing, real-time action timeline, live browser preview
* **Use case:** Automated QA testing, regression testing, user flow validation
```bash
npx @llmgateway/cli init --template qa-agent
```
### Weather Agent
A CLI agent demonstrating tool calling capabilities.
* **Stack:** TypeScript, AI SDK, OpenAI
* **Features:** Tool calling, real-time data, natural language
* **Use case:** Learning tool usage, building CLI agents
```bash
npx @llmgateway/cli init --template weather-agent
```
## Configuration
The CLI stores configuration in `~/.llmgateway/config.json`:
```json
{
"apiKey": "llmgtwy_...",
"defaultTemplate": "image-generation"
}
```
### Environment Variables
The `LLMGATEWAY_API_KEY` environment variable takes precedence over the config file:
```bash
export LLMGATEWAY_API_KEY="llmgtwy_..."
```
## More Resources
* [Agents](https://llmgateway.io/agents) β Pre-built AI agents
* [Templates](https://llmgateway.io/templates) β Production-ready starter projects
* [GitHub Repository](https://github.com/theopenco/llmgateway-templates) β Source code and issues
Need help or want to request a feature? Open an issue on
[GitHub](https://github.com/theopenco/llmgateway-templates/issues).
# Cline Integration
URL: /guides/cline
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
[Cline](https://cline.bot) is an autonomous AI coding assistant that lives in your VS Code editor. It can create and edit files, run terminal commands, and help you build complex projects. You can configure Cline to use LLM Gateway for access to multiple AI providers with unified billing and cost tracking.
## Prerequisites
* VS Code based IDE installed
* An LLM Gateway API key
## Setup
Cline supports OpenAI-compatible API endpoints, making it straightforward to integrate with LLM Gateway.
### Install Cline Extension
1. Open VS Code
2. Go to the Extensions view (Cmd/Ctrl + Shift + X)
3. Search for "Cline"
4. Click **Install** on the Cline extension
### Open Cline Settings
1. Click on the Cline icon in the VS Code sidebar
2. Click the settings gear icon in the Cline panel
### Configure API Provider
1. In the API Provider dropdown, select **OpenAI Compatible**
2. Enter the following details:
* **Base URL**: `https://api.llmgateway.io/v1`
* **API Key**: Your LLM Gateway API key
* **Model ID**: Choose a model (e.g., `claude-opus-4-5-20251101`, `gpt-5.2`, `gemini-3-pro-preview`, `deepseek-3.2`). See [provider-specific routing](/features/routing#provider-specific-routing) for more options.
### Test the Integration
1. Open a project in VS Code
2. Click on the Cline icon in the sidebar
3. Type a message like "Create a hello world function in Python"
4. Cline should respond and offer to create the file
All requests will now be routed through LLM Gateway.
View all available models on the [models page](https://llmgateway.io/models).
## Features
Once configured, you can use all of Cline's features with LLM Gateway:
### Autonomous Coding
* Create new files and projects from scratch
* Edit existing code based on natural language instructions
* Refactor and improve code quality
### Terminal Commands
* Run build commands, tests, and scripts
* Install dependencies
* Execute any terminal operation
### File Management
* Create, read, and modify files
* Navigate your codebase
* Search for relevant code
## Model Selection Tips
### Using Provider-Specific Models
To use a specific provider's version of a model, prefix the model ID with the provider name. See [provider-specific routing](/features/routing#provider-specific-routing) for more options.
### Using Discounted Models
LLM Gateway offers discounted access to some models. Find them on the [models page](https://llmgateway.io/models?view=grid\&filters=1\&discounted=true) and copy the model ID.
### Using Free Models
Some models are available for free. Browse them on the [models page](https://llmgateway.io/models?view=grid\&filters=1\&free=true).
Need help? Join our [Discord community](https://llmgateway.io/discord) for
support and troubleshooting assistance.
## Benefits of Using LLM Gateway with Cline
* **Multi-Provider Access**: Use models from OpenAI, Anthropic, Google, and more through a single API
* **Cost Control**: Track and limit your AI spending with detailed usage analytics
* **Unified Billing**: One account for all providers instead of managing multiple API keys
* **Caching**: Reduce costs with response caching for repeated requests
* **Analytics**: Monitor usage patterns and costs in the dashboard
# Codex CLI Integration
URL: /guides/codex-cli
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
Codex CLI is OpenAI's open-source terminal coding agent. By default it connects to OpenAI's API, but with LLM Gateway you can route it through a single gatewayβuse GPT-5.3 Codex, Gemini, Claude, or any of 180+ models while keeping full cost visibility.
One config file. No code changes. Full cost tracking in your dashboard.
## Setup
### Sign Up for LLM Gateway
[Sign up free](https://llmgateway.io/signup) β no credit card required. Copy your API key from the dashboard.
### Set Your API Key
Set your LLM Gateway API key as the OpenAI key:
```bash
export OPENAI_API_KEY=llmgtwy_your_api_key_here
```
### Create Config File
Create or edit `~/.codex/config.toml`:
```bash
openai_base_url = "https://api.llmgateway.io/v1"
model = "auto"
model_reasoning_effort = "high"
[tui]
show_tooltips = false
[model_providers.openai]
name = "OpenAI"
base_url = "https://api.llmgateway.io/v1"
```
### Run Codex CLI
```bash
codex
```
All requests will now be routed through LLM Gateway.
## Why This Works
LLM Gateway's `/v1` endpoint is fully OpenAI-compatible. Codex CLI sends requests to our gateway instead of OpenAI directly, and we route them to the right provider behind the scenes. This means:
* **Use any model** β GPT-5.3 Codex, Gemini, Claude, or 180+ others
* **Keep your workflow** β Codex CLI doesn't know the difference
* **Track costs** β Every request appears in your LLM Gateway dashboard
* **Automatic caching** β Repeated requests hit cache, saving money
## Configuration Explained
### Base URL
The `openai_base_url` and `base_url` fields point Codex CLI to LLM Gateway instead of OpenAI:
```bash
openai_base_url = "https://api.llmgateway.io/v1"
```
### Model Selection
Use `auto` to let LLM Gateway pick the best model, or set a specific one from the [models page](https://llmgateway.io/models):
```bash
model = "auto"
# or pick a specific model
model = "gpt-5.3-codex"
```
### Reasoning Effort
Control how much reasoning the model uses. Options are `low`, `medium`, and `high`:
```bash
model_reasoning_effort = "high"
```
## Choosing Models
Use `auto` to let LLM Gateway pick the best model automatically, or choose a specific one from the [models page](https://llmgateway.io/models):
```bash
# let LLM Gateway pick the best model
model = "auto"
# or pick a specific model
model = "gpt-5.3-codex"
```
## What You Get
* **Any model in Codex CLI** β GPT-5.3 Codex for heavy lifting, lighter models for routine tasks
* **Cost visibility** β See exactly what each coding session costs
* **One bill** β Stop managing separate accounts for OpenAI, Anthropic, Google
* **Response caching** β Repeated requests hit cache automatically
* **Discounts** β Check [discounted models](https://llmgateway.io/models?discounted=true) for savings up to 90%
## Troubleshooting
### Authentication errors
Make sure your `OPENAI_API_KEY` environment variable is set to your LLM Gateway API key (starts with `llmgtwy_`).
### Model not found
Verify the model ID matches exactly what's listed on the [models page](https://llmgateway.io/models). Model IDs are case-sensitive.
### Connection issues
Check that `base_url` is set to `https://api.llmgateway.io/v1` (note the `/v1` at the end).
View all available models on the [models page](https://llmgateway.io/models).
Need help? Join our [Discord community](https://llmgateway.io/discord) for
support and troubleshooting assistance.
# Cursor Integration
URL: /guides/cursor
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
Cursor is an AI-powered code editor built on VSCode. You can configure Cursor to use LLM Gateway for enhanced AI capabilities, access to multiple models, and better cost control.
## Prerequisites
* An LLM Gateway account with an API key
* Cursor IDE installed
* Basic understanding of Cursor's AI features
## Setup
Cursor supports OpenAI-compatible API endpoints, making it easy to integrate with LLM Gateway.
### Get Your API Key
1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard)
2. Navigate to **API Keys** section
3. Create a new API key and copy the key
### Configure Cursor Settings
1. Open Cursor and go to **Settings** then Click on "Cursor Settings"
2. Click on "Models"
3. Click on "Add OpenAI API Key"
3. Scroll down to **OpenAI API Key** section
4. Click on **Add OpenAI API Key**
5. Enter your LLM Gateway API key
6. In the same Models settings, find the **Override OpenAI Base URL** option
7. Enable the override option
8. Enter the LLM Gateway endpoint: `https://api.llmgateway.io/v1`
### Select Models
1. In the **Models** section, you can now select from available models
2. Choose any [LLM Gateway supported model](https://llmgateway.io/models):
* For chat: Use models like `gpt-5`, `gpt-4o`, `claude-sonnet-4-5`
* For custom models: Add the provider name before the model name (e.g. `custom/my-model`)
* For discounted models: copy the ids from from the [models page](https://llmgateway.io/models?view=grid\&filters=1\&discounted=true)
* For free models: copy the ids from from the [models page](https://llmgateway.io/models?view=grid\&filters=1\&free=true)
* For reasoning models: copy the ids from from the [models page](https://llmgateway.io/models?view=grid\&filters=1\&reasoning=true)
### Test the Integration
1. Open any code file in Cursor
2. Try using the AI chat (Cmd/Ctrl + L)
3. Or test the autocomplete feature while typing
All AI requests will now be routed through LLM Gateway.
## Features
Once configured, you can use all of Cursor's AI features with LLM Gateway:
### AI Chat (Cmd/Ctrl + L)
* Ask questions about your code
* Request code explanations
* Get debugging help
* Generate new code
### Inline Edit (Cmd/Ctrl + K)
* Edit code with natural language instructions
* Refactor functions
* Add features to existing code
### Autocomplete
* Get intelligent code suggestions as you type
* Context-aware completions based on your codebase
## Advanced Configuration
### Using Different Models for Different Features
Cursor allows you to configure different models for different features:
1. **Chat Model**: Use a powerful model like `gpt-5` or `claude-sonnet-4-5`
2. **Autocomplete Model**: Use a faster, cost-effective model like `gpt-4o-mini`
3. **Custom Model**: Use a custom model like `custom/my-model`
4. **Reasoning Model**: Use a reasoning model like `canopywave/kimi-k2-thinking` [with 75% off discount](https://llmgateway.io/changelog/canopywave-kimi-k2-thinking-discount)
This gives you the best balance of performance and cost.
### Model Routing
With LLM Gateway's [routing features](/features/routing), you can:
* **Chooses cost-effective models** by default for optimal price-to-performance ratio
* **Automatically scales to more powerful models** based on your request's context size
* **Handles large contexts intelligently** by selecting models with appropriate context windows
## Troubleshooting
### Authentication Errors
If you see authentication errors:
* Verify your API key is correct
* Check that the base URL is set to `https://api.llmgateway.io/v1`
* Ensure your LLM Gateway account has sufficient credits
### Model Not Found
If you see "model not found" errors:
* Verify the model ID exists in the [models page](https://llmgateway.io/models)
* Check that you're using the correct model name format
* Some models may require specific provider configurations in your LLM Gateway dashboard
### Slow Responses
If responses are slow:
* Check your internet connection
* Monitor your usage in the LLM Gateway dashboard
* Consider using faster models for autocomplete features
Need help? Join our [Discord community](https://llmgateway.io/discord) for
support and troubleshooting assistance.
## Benefits of Using LLM Gateway with Cursor
* **Multi-Provider Access**: Use models from OpenAI, Anthropic, Google, Open-source models and more
* **Cost Control**: Track and limit your AI spending with detailed usage analytics
* **Caching**: Reduce costs with response caching
* **Analytics**: Monitor usage patterns and costs
# Model Context Protocol (MCP)
URL: /guides/mcp
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
import { Tabs, Tab } from "fumadocs-ui/components/tabs";
LLM Gateway provides a Model Context Protocol (MCP) server that enables AI assistants like Claude Code to access multiple LLM providers through a unified interface. This allows you to use any model from OpenAI, Anthropic, Google, and more directly from your AI coding assistant.
## What is MCP?
The Model Context Protocol (MCP) is an open standard that allows AI assistants to connect with external tools and data sources. LLM Gateway's MCP server exposes tools for:
* **Chat completions** - Send messages to any supported LLM
* **Image generation** - Generate images using models like Qwen Image
* **Nano Banana image generation** - Generate images with Gemini 3 Pro Image Preview and optionally save to disk
* **Model discovery** - List available models with capabilities and pricing
## Available Tools
### `chat`
Send a message to any LLM and get a response.
**Parameters:**
* `model` (string) - The model to use (e.g., `"gpt-4o"`, `"claude-sonnet-4-20250514"`)
* `messages` (array) - Array of messages with `role` and `content`
* `temperature` (number, optional) - Sampling temperature (0-2)
* `max_tokens` (number, optional) - Maximum tokens to generate
**Example:**
```json
{
"model": "gpt-4o",
"messages": [{ "role": "user", "content": "Explain quantum computing" }],
"temperature": 0.7
}
```
### `generate-image`
Generate images from text prompts using AI image models.
**Parameters:**
* `prompt` (string) - Text description of the image to generate
* `model` (string, optional) - Image model (default: `"qwen-image-plus"`)
* `size` (string, optional) - Image size (default: `"1024x1024"`)
* `n` (number, optional) - Number of images (1-4, default: 1)
**Example:**
```json
{
"prompt": "A serene mountain landscape at sunset",
"model": "qwen-image-max",
"size": "1024x1024"
}
```
### `generate-nano-banana`
Generate an image using Gemini 3 Pro Image Preview ("Nano Banana"). Returns an inline image preview, and optionally saves the image to disk when the server is configured with an upload directory.
**Parameters:**
* `prompt` (string) - Text description of the image to generate
* `filename` (string, optional) - Filename for the saved image, no path separators allowed (default: `nano-banana-{timestamp}.png`)
* `aspect_ratio` (string, optional) - Aspect ratio: `"1:1"`, `"16:9"`, `"4:3"`, or `"5:4"`
**Example:**
```json
{
"prompt": "A pixel-art cat sitting on a rainbow",
"filename": "hero-image.png",
"aspect_ratio": "16:9"
}
```
**Saving images to disk** requires the `UPLOAD_DIR` environment variable to be
set on the MCP server. When set, images are saved to that directory. Without
it, images are returned inline only β no files are written to disk. See
[Enabling local image saving](#enabling-local-image-saving) for setup
instructions.
### `list-models`
List available LLM models with capabilities and pricing.
**Parameters:**
* `include_deactivated` (boolean, optional) - Include deactivated models
* `exclude_deprecated` (boolean, optional) - Exclude deprecated models
* `limit` (number, optional) - Maximum models to return (default: 20)
* `family` (string, optional) - Filter by family (e.g., `"openai"`, `"anthropic"`)
### `list-image-models`
List all available image generation models.
**Example output:**
```
# Image Generation Models
## Qwen Image Plus
- **Model ID:** `qwen-image-plus`
- **Description:** Text-to-image with excellent text rendering
- **Price:** $0.03 per request
## Qwen Image Max
- **Model ID:** `qwen-image-max`
- **Description:** Highest quality text-to-image
- **Price:** $0.075 per request
```
## Setup
### Get Your API Key
1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard)
2. Navigate to **API Keys** section
3. Create a new API key and copy it
### Configure Claude Code
Run the following command in your terminal:
```bash
claude mcp add --transport http --scope user llmgateway https://api.llmgateway.io/mcp \
--header "Authorization: Bearer your-api-key-here"
```
**Alternative: Manual configuration**
You can also add the MCP server manually by editing `~/.claude.json` (user scope) or `.mcp.json` in your project root (project scope):
```json
{
"mcpServers": {
"llmgateway": {
"url": "https://api.llmgateway.io/mcp",
"headers": {
"Authorization": "Bearer your-api-key-here"
}
}
}
}
```
Restart Claude Code after manual configuration changes.
### Test the Integration
Try using the tools in Claude Code:
* "Use the chat tool to ask GPT-4o about TypeScript best practices"
* "Generate an image of a futuristic city using the generate-image tool"
* "Use generate-nano-banana to create a hero image for my landing page"
* "List all available models from Anthropic"
### Get Your API Key
1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard)
2. Navigate to **API Keys** section
3. Create a new API key and copy it
4. Set it as an environment variable: `export LLM_GATEWAY_API_KEY="your-api-key-here"`
### Configure Codex
Run the following command in your terminal:
```bash
codex mcp add llmgateway --url https://api.llmgateway.io/mcp \
--bearer-token-env-var LLM_GATEWAY_API_KEY
```
**Alternative: Manual configuration**
You can also add the MCP server manually by editing `~/.codex/config.toml`:
```toml
[mcp_servers.llmgateway]
url = "https://api.llmgateway.io/mcp"
bearer_token_env_var = "LLM_GATEWAY_API_KEY"
```
### Test the Integration
Run `/mcp` in the Codex TUI to confirm the `llmgateway` server is connected. Try:
* "Use the chat tool to ask GPT-4o about TypeScript best practices"
* "Generate an image of a futuristic city using the generate-image tool"
* "Use generate-nano-banana to create a hero image for my landing page"
* "List all available models from Anthropic"
### Get Your API Key
1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard)
2. Navigate to **API Keys** section
3. Create a new API key and copy it
### Configure Cursor
Add the following to your Cursor MCP configuration file (`~/.cursor/mcp.json`):
```json
{
"mcpServers": {
"llmgateway": {
"url": "https://api.llmgateway.io/mcp",
"headers": {
"Authorization": "Bearer your-api-key-here"
}
}
}
}
```
Or open the Command Palette (`Cmd/Ctrl + Shift + P`), search for **"Cursor Settings"**, then go to **Tools & Integrations** > **Add Custom MCP** and paste the configuration above.
Cursor v0.48.0+ is required for Streamable HTTP MCP support.
### Test the Integration
Open a chat in **Agent Mode**, click the **Select Tools** icon, and verify the LLM Gateway tools appear. Try:
* "Use the chat tool to ask GPT-4o about TypeScript best practices"
* "Generate an image of a futuristic city using the generate-image tool"
* "Use generate-nano-banana to create a hero image for my landing page"
* "List all available models from Anthropic"
LLM Gateway's MCP server supports the standard HTTP Streamable transport. Configure your client with:
* **Endpoint:** `https://api.llmgateway.io/mcp`
* **Authentication:** Bearer token via `Authorization` header or `x-api-key` header
* **Protocol Version:** 2024-11-05
**Direct HTTP Example:**
```bash
curl -X POST https://api.llmgateway.io/mcp \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/list"
}'
```
**Server-Sent Events (SSE):**
For real-time updates, connect with `Accept: text/event-stream`:
```bash
curl -N https://api.llmgateway.io/mcp \
-H "Accept: text/event-stream" \
-H "Authorization: Bearer your-api-key"
```
## Use Cases
### Multi-Model Access in Claude Code
Use Claude Code to interact with models it doesn't natively support:
```
Use the chat tool with model "gpt-4o" to analyze this code for security issues.
```
### Image Generation
Generate images directly from your AI assistant:
```
Use generate-image to create a logo for my new startup.
It should be minimalist, blue and white, representing AI and cloud computing.
```
### Nano Banana (Gemini Image Generation)
Generate images with Gemini 3 Pro for use in your project:
```
Use generate-nano-banana to create a hero image for my landing page with a 16:9 aspect ratio.
```
### Cost-Effective Model Selection
Query available models to find the best option for your task:
```
List models from OpenAI and Anthropic, then use the cheapest one for this simple task.
```
## Authentication
The MCP server supports two authentication methods:
1. **Bearer Token** - `Authorization: Bearer your-api-key`
2. **API Key Header** - `x-api-key: your-api-key`
Your API key is the same one you use for the REST API and works across all LLM Gateway services.
## OAuth Support
For applications that prefer OAuth authentication, LLM Gateway's MCP server implements OAuth 2.0:
* **Authorization Endpoint:** `/oauth/authorize`
* **Token Endpoint:** `/oauth/token`
* **Registration Endpoint:** `/oauth/register`
* **Supported Flows:** Authorization Code, Client Credentials
## Enabling Local Image Saving
By default, `generate-nano-banana` returns images inline without writing to disk. To enable saving generated images to the server filesystem, the `UPLOAD_DIR` environment variable must be set on the **gateway host** at startup. This is a server-side setting β it cannot be configured from the client.
This is only possible for **self-hosted** MCP deployments. Configure `UPLOAD_DIR` using your deployment method:
* **Docker:** Pass `-e UPLOAD_DIR=/data/images` or add it to your `docker-compose.yml` environment section.
* **systemd:** Add `Environment=UPLOAD_DIR=/data/images` to your service unit file.
* **.env file:** Add `UPLOAD_DIR=/data/images` to the `.env` file loaded by your gateway process.
The shared hosted endpoint (`api.llmgateway.io`) does not support configuring
`UPLOAD_DIR`. On the hosted service, images are always returned inline β no
files are written to disk. To enable server-side image saving, you must
self-host the MCP server and set `UPLOAD_DIR` at startup.
## Troubleshooting
### Connection Errors
If you're having trouble connecting:
1. Verify your API key is valid
2. Check the endpoint URL is correct: `https://api.llmgateway.io/mcp`
3. Ensure your firewall allows outbound HTTPS connections
### Tool Not Found
If tools aren't appearing:
1. Restart your MCP client
2. Check the configuration syntax
3. Verify the MCP server is responding: `GET https://api.llmgateway.io/mcp`
### Rate Limiting
The MCP server respects your account's rate limits. If you're hitting limits:
1. Check your usage in the dashboard
2. Consider upgrading your plan
3. Implement request queuing in your application
Need help? Join our [Discord community](https://llmgateway.io/discord) for
support.
## Benefits
* **Unified Access** - Use 200+ models from 20+ providers through one interface
* **Cost Tracking** - Monitor usage and costs in the LLM Gateway dashboard
* **Caching** - Automatic response caching reduces costs and latency
* **Fallback** - Automatic provider failover ensures reliability
* **Image Generation** - Generate images directly from your AI assistant
# N8n Integration
URL: /guides/n8n
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
n8n is a powerful workflow automation tool that can be enhanced with AI capabilities through LLM Gateway. This guide shows how to integrate LLM Gateway into your n8n workflows.
## Prerequisites
* An LLM Gateway account with an API key
* n8n instance (self-hosted or cloud)
* Basic understanding of n8n workflows
## Setup
The easiest way to use LLM Gateway with n8n is through the OpenAI node with custom configuration.
### Add OpenAI Credentials
1. In n8n, go to **Settings** β **Credentials**
2. Click **Add Credential** β **OpenAI**
3. Configure as follows:
* **API Key**: Your LLM Gateway API key
* **Base URL**: `https://api.llmgateway.io/v1`
* **Organization ID**: Leave blank
### Configure OpenAI Node
1. Add an **AI Agent** node to your workflow
2. Add a **Chat Model** edge to the node
3. Configure the node to use the LLMGateway provider
Note: You have to toggle off the responses API. LLMGateway does not support
it.
4. Select your desired options
* **Model**: Use any [LLMGateway model](https://llmgateway.io/models) ID (e.g., `gpt-5`)
* **Options**: Optionally, configure LLM parameters
### Test Workflow
Finally, try running your workflow with a test prompt.
# OpenClaw Integration
URL: /guides/openclaw
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
[OpenClaw](https://docs.openclaw.ai/) is a self-hosted gateway that connects your favorite chat appsβWhatsApp, Telegram, Discord, iMessage, and moreβto AI coding agents. With LLM Gateway as a custom provider, you can route all your OpenClaw traffic through a single API, use any of 180+ models, and keep full visibility into usage and costs.
## Setup
### Sign Up for LLM Gateway
[Sign up free](https://llmgateway.io/signup) β no credit card required. Copy your API key from the dashboard.
### Set Your API Key
```bash
export LLMGATEWAY_API_KEY=llmgtwy_your_api_key_here
```
### Configure OpenClaw
Add LLM Gateway as a custom provider in your `~/.openclaw/openclaw.json`:
```json
{
"models": {
"mode": "merge",
"providers": {
"llmgateway": {
"baseUrl": "https://api.llmgateway.io/v1",
"apiKey": "${LLMGATEWAY_API_KEY}",
"api": "openai-completions",
"models": [
{
"id": "gpt-5.4",
"name": "GPT-5.4",
"contextWindow": 128000,
"maxTokens": 32000
},
{
"id": "claude-opus-4-6",
"name": "Claude Opus 4.6",
"contextWindow": 200000,
"maxTokens": 8192
},
{
"id": "gemini-3-1-pro-preview",
"name": "Gemini 3.1 Pro",
"contextWindow": 1000000,
"maxTokens": 8192
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "llmgateway/gpt-5.4"
}
}
}
}
```
### Start Chatting
Launch OpenClaw and start chatting across your connected channels. All requests will be routed through LLM Gateway.
## Why Use LLM Gateway with OpenClaw
* **Model flexibility** β Switch between GPT-5.4, Claude Opus, Gemini, or any of 180+ models
* **Cost tracking** β Monitor exactly how much your chat agents cost to run
* **Single bill** β No need to manage multiple API provider accounts
* **Response caching** β Repeated queries hit cache, reducing costs
* **Rate limit handling** β Automatic fallback between providers
## Switching Models
Change the primary model in your config to switch between any model:
```json
{
"agents": {
"defaults": {
"model": { "primary": "llmgateway/claude-opus-4-6" }
}
}
}
```
## Model Fallback Chain
OpenClaw supports fallback models. If the primary model is unavailable, it automatically falls back:
```json
{
"agents": {
"defaults": {
"model": {
"primary": "llmgateway/gpt-5.4",
"fallbacks": ["llmgateway/claude-opus-4-6"]
}
}
}
}
```
## Available Models
LLM Gateway uses root model IDs with smart routingβautomatically selecting the best provider based on uptime, throughput, price, and latency. You can use any model from the [models page](https://llmgateway.io/models). Flagship models include:
| Model | Best For |
| ------------------------ | ------------------------------------------- |
| `gpt-5.4` | Latest OpenAI flagship, highest quality |
| `claude-opus-4-6` | Anthropic's most capable model |
| `claude-sonnet-4-6` | Fast reasoning with extended thinking |
| `gemini-3-1-pro-preview` | Google's latest flagship, 1M context window |
| `o3` | Advanced reasoning tasks |
| `gpt-5.4-pro` | Premium tier with extended reasoning |
| `gemini-2.5-flash` | Fast responses, good for high-volume |
| `claude-haiku-4-5` | Cost-effective, quick responses |
| `grok-3` | xAI flagship |
| `deepseek-v3.1` | Open-source with vision and tools |
For more details on routing behavior, see [routing](/features/routing).
View all available models on the [models page](https://llmgateway.io/models).
## Tips for Chat Agents
### Optimize Costs
1. **Use smaller models for simple tasks** β Claude Haiku or Gemini Flash handle basic Q\&A well
2. **Enable caching** β LLM Gateway caches identical requests automatically
3. **Set token limits** β Configure max tokens to prevent runaway costs
### Improve Response Quality
1. **Choose the right model** β Claude Opus excels at nuanced conversation, GPT-5.4 at general tasks
2. **Use system prompts** β Configure your agent's personality and capabilities
3. **Test multiple models** β LLM Gateway makes it easy to A/B test different providers
Need help? Join our [Discord community](https://llmgateway.io/discord) for
support and troubleshooting assistance.
# OpenCode Integration
URL: /guides/opencode
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
[OpenCode](https://opencode.ai) is an open-source AI coding agent for your terminal, IDE, or desktop. This guide shows you how to connect it to LLM Gatewayβgiving you access to 180+ models from 60+ providers, all tracked in one dashboard.
## Prerequisites
* OpenCode installed β visit the [OpenCode download page](https://opencode.ai/download) for your platform
* An LLM Gateway API key
## Setup
### Create Configuration File
Create `config.json` in your OpenCode configuration directory:
**macOS/Linux:** `~/.config/opencode/config.json`
**Windows:** `C:\Users\YourUsername\.config\opencode\config.json`
```json
{
"provider": {
"llmgateway": {
"npm": "@ai-sdk/openai-compatible",
"name": "LLM Gateway",
"options": {
"baseURL": "https://api.llmgateway.io/v1"
},
"models": {
"gpt-5": {
"name": "GPT-5"
},
"gpt-5-mini": {
"name": "GPT-5 Mini"
},
"gemini-2.5-pro": {
"name": "Gemini 2.5 Pro"
},
"claude-3-5-sonnet-20241022": {
"name": "Claude 3.5 Sonnet"
}
}
}
},
"model": "llmgateway/gpt-5"
}
```
### Launch OpenCode and Connect Provider
Start OpenCode from your terminal:
```bash
opencode
```
**In VS Code/Cursor:**
1. Install the OpenCode extension from the marketplace
2. Open Command Palette (Ctrl+Shift+P or Cmd+Shift+P)
3. Type "OpenCode" and select "Open opencode"
Once OpenCode launches, run the `/connect` command to connect to LLM Gateway.
### Select LLM Gateway Provider
In the provider list, scroll down to find "LLM Gateway" under the "Other" section and select it.
### Enter Your API Key
OpenCode will prompt you for your API key. Enter your LLM Gateway API key and press Enter. OpenCode will automatically save your credentials securely.
[Sign up for LLM Gateway](https://llmgateway.io/signup) and create an API key from your dashboard.
### Start Using OpenCode
You're all set! OpenCode is now connected to LLM Gateway. You can start asking questions and building with AI.
## Why Use LLM Gateway with OpenCode
* **180+ models** β GPT-5, Claude, Gemini, Llama, and more from 60+ providers
* **One API key** β Stop juggling credentials for every provider
* **Cost tracking** β See what each coding session costs in your dashboard
* **Response caching** β Repeated requests hit cache automatically
* **Volume discounts** β The more you use, the more you save
## Adding More Models
You can add any model from the [models page](https://llmgateway.io/models) to your configuration. Simply add more entries to the `models` object in your `config.json`:
```json
{
"provider": {
"llmgateway": {
"models": {
"gpt-5": { "name": "GPT-5" },
"gpt-5-mini": { "name": "GPT-5 Mini" },
"deepseek/deepseek-chat": { "name": "DeepSeek Chat" },
"meta/llama-3.3-70b": { "name": "Llama 3.3 70B" }
}
}
}
}
```
After updating `config.json`, restart OpenCode to see the new models.
## Switching Models
To change your default model, update the `model` field in your configuration:
```json
{
"model": "llmgateway/gpt-5-mini"
}
```
Or select a different model directly in the OpenCode interface.
View all available models on the [models page](https://llmgateway.io/models).
## Troubleshooting
### OpenCode asks for API key every time
Make sure the provider ID in your `config.json` matches exactly: `"llmgateway"` (all lowercase, no spaces).
### 404 Not Found errors
Verify your `baseURL` is set to `https://api.llmgateway.io/v1` (note the `/v1` at the end).
### Models not showing up
After editing `config.json`, restart OpenCode completely for changes to take effect.
### Connection timeout
Check that you have an active internet connection and that your API key is valid from the [dashboard](https://llmgateway.io/dashboard).
## Configuration Tips
* **Global configuration**: Use `~/.config/opencode/config.json` to apply settings across all projects
* **Project-specific**: Place `opencode.json` in your project root to override global settings for that project
* **Model selection**: You can specify different models for different types of tasks using OpenCode's agent configuration
Need help? Join our [Discord community](https://llmgateway.io/discord) for
support and troubleshooting assistance.
# Anthropic API Compatibility
URL: /features/anthropic-endpoint
import { Callout } from "fumadocs-ui/components/callout";
# Anthropic API Compatibility
LLMGateway provides a native Anthropic-compatible endpoint at `/v1/messages` that allows you to use any model in our catalog while maintaining the familiar Anthropic API format
This is especially useful for applications designed for Claude that you want to extend to use other models.
Enjoy a 50% discount on our Anthropic models for a limited time.
## Overview
The Anthropic endpoint transforms requests from Anthropic's message format to the OpenAI-compatible format used by LLMGateway, then transforms the responses back to Anthropic's format. This means you can:
* Use **any model** available in LLMGateway with Anthropic's API format
* Maintain existing code that uses Anthropic's SDK or API format
* Access models from OpenAI, Google, Cohere, and other providers through the Anthropic interface
* Leverage LLMGateway's routing, caching, and cost optimization features
## Basic Usage
## Configuration for Claude Code
This endpoint is perfect for configuring Claude Code to use any model available in LLMGateway:
```bash
export ANTHROPIC_BASE_URL=https://api.llmgateway.io
export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here
# optional: specify a model, otherwise it uses the default Claude model
export ANTHROPIC_MODEL=gpt-5 # or any model from our catalog
# now run claude!
claude
```
### Choosing Models
You can use any model from the [models page](https://llmgateway.io/models). Popular options for Claude Code include:
```bash
# Use OpenAI's latest model
export ANTHROPIC_MODEL=gpt-5
# Use a cost-effective alternative
export ANTHROPIC_MODEL=gpt-5-mini
# Use Google's Gemini
export ANTHROPIC_MODEL=gemini-2.5-pro
# Use Anthropic's actual Claude models
export ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
```
## Environment Variables
When configuring Claude Code or other Anthropic-compatible applications, you can use these environment variables:
### ANTHROPIC\_MODEL
Specifies the main model to use for primary requests.
* **Default**: `claude-sonnet-4-20250514`
* **Example**: `export ANTHROPIC_MODEL=gpt-5`
### ANTHROPIC\_SMALL\_FAST\_MODEL
Specifies a smaller, faster model used for background functionality and internal operations.
* **Default**: `claude-3-5-haiku-20241022`
* **Example**: `export ANTHROPIC_SMALL_FAST_MODEL=gpt-5-nano`
```bash
# Example configuration
export ANTHROPIC_BASE_URL=https://api.llmgateway.io
export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here
export ANTHROPIC_MODEL=gpt-5
export ANTHROPIC_SMALL_FAST_MODEL=gpt-5-nano
```
## Advanced Features
### Making a manual request
```bash
curl -X POST "https://api.llmgateway.io/v1/messages" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"max_tokens": 100
}'
```
### Response Format
The endpoint returns responses in Anthropic's message format:
```json
{
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"model": "gpt-5",
"content": [
{
"type": "text",
"text": "Hello! I'm doing well, thank you for asking. How can I help you today?"
}
],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 13,
"output_tokens": 20
}
}
```
# API Keys & IAM Rules
URL: /features/api-keys
import { Tabs, Tab } from "fumadocs-ui/components/tabs";
import { Callout } from "fumadocs-ui/components/callout";
# API Keys & IAM Rules
API keys are the primary method for authenticating with the LLM Gateway. This guide covers creating API keys, managing them, and configuring IAM rules for fine-grained access control.
## Overview
LLM Gateway provides comprehensive API key management with the following features:
* **Basic API Key Management**: Create, list, update, and delete API keys
* **Usage Limits**: Set spending limits on individual API keys
* **IAM Rules**: Fine-grained access control for models, providers, and pricing
* **Usage Tracking**: Monitor API key usage and costs
* **Status Management**: Enable/disable keys without deletion
## Creating API Keys
### Via Dashboard
At this time, API keys can only be created via the dashboard.
1. Navigate to your project in the LLM Gateway dashboard
2. Go to the **API Keys** section
3. Click **Create API Key**
4. Provide a description for your key
5. Optionally set a usage limit
6. Click **Create**
API keys are shown in full only once during creation. Make sure to copy and
store them securely.
## Using API Keys
Once you have an API key, use it in the `Authorization` header of your requests:
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer llmgtwy_your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
## API Key Management
## Disabling/Enabling API Keys
You can disable an API key to stop it from being used, but the key is not deleted and can be re-enabled later.
## Usage Limits
Usage is tracked per API key which is shown on the API Keys page. Usage includes both costs from LLM Gateway credits and usage from your own provider keys when applicable, giving you complete visibility into total spending per key.
You can set a maximum usage limit for each API key. When the limit is reached, requests using that key will return an error.
## IAM Rules
IAM (Identity Access Management) rules provide fine-grained access control over what models, providers, and pricing tiers an API key can access.
### Rule Types
#### Model Access Rules
Control access to specific models:
* **Allow Models**: Only allow access to specific models
* **Deny Models**: Block access to specific models
#### Provider Access Rules
Control access to specific providers:
* **Allow Providers**: Only allow access to specific providers
* **Deny Providers**: Block access to specific providers
#### Pricing Rules
Control access based on model pricing:
* **Allow Pricing**: Set constraints on what pricing tiers are allowed
* **Deny Pricing**: Block specific pricing tiers
* **Free vs Paid**: Allow or deny access to free vs paid models
## Error Handling
When API keys encounter IAM rule violations, the API returns specific error messages:
```json
{
"error": true,
"status": 403,
"message": "Access denied: Model gpt-4 is not in the allowed models list"
}
```
Common error scenarios:
* Model not allowed by IAM rules
* Provider blocked by IAM rules
* Pricing limits exceeded
* API key disabled or deleted
* Usage limit reached
## Migration from Legacy Keys
If you have existing API keys without IAM rules:
1. **Backward Compatibility**: Existing keys continue to work without restrictions
2. **Gradual Migration**: Add IAM rules incrementally
3. **Testing**: Test IAM rules in development before applying to production
4. **Monitoring**: Monitor for access denied errors after implementing rules
API keys without IAM rules have unrestricted access to all models and
providers.
# Audit Logs
URL: /features/audit-logs
import { Callout } from "fumadocs-ui/components/callout";
# Audit Logs
Audit logs provide complete visibility into all actions within your organization. Track who did what, when, and to which resource.
Audit logs are available on the [**Enterprise
plan**](https://llmgateway.io/enterprise) for organization owners and admins.
## What's Tracked
Every significant action is logged with detailed metadata:
| Field | Description |
| ----------------- | -------------------------------------------------------- |
| **Timestamp** | When the action occurred |
| **User** | Who performed the action (name and email) |
| **Action** | What was done (e.g., `api_key.create`, `project.update`) |
| **Resource Type** | Category of the affected resource |
| **Resource ID** | Unique identifier of the affected resource |
| **Details** | Additional context like resource names or changed fields |
## Tracked Actions
### Organization Management
* `organization.update` β Organization settings changed
* `organization.delete` β Organization deleted
### Project Management
* `project.create` β New project created
* `project.update` β Project settings changed
* `project.delete` β Project deleted
### Team Management
* `team_member.add` β New member invited
* `team_member.update` β Member role changed
* `team_member.remove` β Member removed
### API Key Management
* `api_key.create` β New API key created
* `api_key.update_status` β API key enabled/disabled
* `api_key.update_limit` β Usage limit changed
* `api_key.delete` β API key deleted
* `api_key.iam_rule.create` β IAM rule added
* `api_key.iam_rule.update` β IAM rule modified
* `api_key.iam_rule.delete` β IAM rule removed
### Provider Key Management
* `provider_key.create` β Provider key added
* `provider_key.update` β Provider key status changed
* `provider_key.delete` β Provider key removed
### Billing Events
* `subscription.create` β Subscription started
* `subscription.cancel` β Subscription cancelled
* `subscription.resume` β Subscription resumed
* `payment.credit_topup` β Credits purchased
## Filtering and Search
Filter logs by:
* **Action** β Specific action type
* **Resource Type** β Category of resource
* **User** β Who performed the action
* **Date Range** β Time period
## Data Retention
Audit logs are retained for **90 days** on the Enterprise plan.
## Access Control
Only organization **owners** and **admins** can view audit logs. This ensures sensitive activity data is only visible to authorized personnel.
## Get Started
Audit logs are an Enterprise feature. [Contact us](https://llmgateway.io/enterprise) to enable Enterprise for your organization.
# Caching
URL: /features/caching
import { Callout } from "fumadocs-ui/components/callout";
# Caching
LLM Gateway provides intelligent response caching that can significantly reduce your API costs and response latency. When caching is enabled, identical requests are served from cache instead of making redundant calls to LLM providers.
## How It Works
When you make an API request:
1. LLM Gateway generates a cache key based on the request parameters
2. If a matching cached response exists, it's returned immediately
3. If no cache exists, the request is forwarded to the provider
4. The response is cached for future identical requests
This means repeated identical requests are served instantly from cache without incurring additional provider costs.
## Cost Savings
Caching can dramatically reduce costs for applications with repetitive requests:
| Scenario | Without Caching | With Caching | Savings |
| --------------------------- | --------------- | ------------ | ------- |
| 1,000 identical requests | $10.00 | $0.01 | 99.9% |
| 50% duplicate rate | $10.00 | $5.00 | 50% |
| Retry after transient error | $0.02 | $0.01 | 50% |
Cached responses are free from provider costs. You only pay for the initial
request that populates the cache.
## Requirements
Caching requires [Data Retention](/features/data-retention) to be enabled with
"Retain All Data" level. This allows LLM Gateway to store and retrieve
response payloads.
To use caching:
1. Enable **Data Retention** in your organization settings with "Retain All Data" level
2. Enable **Caching** in your project settings under Preferences
3. Configure the cache duration (TTL) as needed
4. Make requests as normalβcaching is automatic
## Cache Key Generation
The cache key is generated from these request parameters:
* Model identifier
* Messages array (roles and content)
* Temperature
* Max tokens
* Top P
* Tools/functions
* Tool choice
* Response format
* System prompt
* Other model-specific parameters
Requests with different parameter values, even slight variations, will not
share cache entries.
## Cache Behavior
### Cache Hits
When a cache hit occurs:
* Response is returned immediately (sub-millisecond latency)
* No provider API call is made
* No inference costs are incurred
### Cache Misses
When a cache miss occurs:
* Request is forwarded to the LLM provider
* Response is stored in cache
* Normal inference costs apply
* Future identical requests will hit the cache
## Streaming and Caching
Caching works with both streaming and non-streaming requests:
* **Non-streaming**: Full response is cached and returned
* **Streaming**: The complete response is reconstructed from cache and streamed back
## Cache TTL (Time-to-Live)
Cache duration is configurable per project in your project settings. You can set the cache TTL from 10 seconds up to 1 year (31,536,000 seconds).
The default cache duration is 60 seconds. Adjust this based on your use caseβlonger durations work well for static content, while shorter durations are better for frequently changing data.
## Identifying Cached Responses
Cached responses show zero or minimal token usage since no inference occurred:
```json
{
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0,
"cost_usd_total": 0
}
}
```
## Use Cases
### Development and Testing
During development, you often send the same prompts repeatedly:
```typescript
// This prompt will only incur costs once
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Explain quantum computing" }],
});
```
### Chatbots with Common Questions
FAQ-style interactions often have repeated questions:
```typescript
// Common questions are served from cache
const faqs = [
"What are your business hours?",
"How do I reset my password?",
"What is your return policy?",
];
```
### Batch Processing
Processing large datasets with potentially duplicate items:
```typescript
// Duplicate items in batch are served from cache
for (const item of items) {
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: `Classify: ${item}` }],
});
}
```
## Best Practices
### Maximize Cache Hits
* Use consistent prompt formatting
* Normalize input data before sending
* Use deterministic parameters (temperature: 0)
* Avoid including timestamps or random values in prompts
### Appropriate Use Cases
Caching is most effective for:
* Static knowledge queries
* Classification tasks
* FAQ responses
* Development/testing
* Retry scenarios
### When to Avoid Caching
Caching may not be suitable for:
* Real-time data requirements
* Highly personalized responses
* Time-sensitive information
* Creative tasks requiring variety
## Storage Costs
Since caching requires data retention, storage costs apply:
* **Rate**: $0.01 per 1 million tokens
* **Applies to**: All tokens in cached requests and responses
See [Data Retention](/features/data-retention) for complete pricing details.
The cost savings from caching typically far outweigh the storage costs,
especially for applications with high request duplication.
# Cost Breakdown
URL: /features/cost-breakdown
import { Callout } from "fumadocs-ui/components/callout";
# Cost Breakdown
LLM Gateway provides real-time cost information for each API request directly in the response's `usage` object. This allows you to track costs programmatically without needing to query the dashboard.
Cost breakdown is available for all users on both hosted and self-hosted
deployments.
## Response Format
When cost breakdown is enabled, your API responses will include additional cost fields in the `usage` object:
```json
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1234567890,
"model": "openai/gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 15,
"total_tokens": 25,
"cost_usd_total": 0.000125,
"cost_usd_input": 0.000025,
"cost_usd_output": 0.0001,
"cost_usd_cached_input": 0,
"cost_usd_request": 0,
"cost_usd_data_storage": 0.00000025
}
}
```
## Cost Fields
| Field | Description |
| ----------------------- | ---------------------------------------------------------------------------------- |
| `cost_usd_total` | Total inference cost for the request in USD (excludes storage) |
| `cost_usd_input` | Cost for input/prompt tokens in USD |
| `cost_usd_output` | Cost for output/completion tokens in USD |
| `cost_usd_cached_input` | Cost for cached input tokens in USD (discounted rate) |
| `cost_usd_request` | Per-request cost in USD (for models with request-based pricing) |
| `cost_usd_data_storage` | LLM Gateway storage cost in USD ($0.01 per 1M tokens, only when retention enabled) |
**Note:** `cost_usd_total` includes only provider/inference costs. Data
storage costs (`cost_usd_data_storage`) are billed separately by LLM Gateway
when data retention is enabled in organization policies.
## Streaming Responses
Cost information is also available in streaming responses. The cost fields are included in the final usage chunk sent before the `[DONE]` message:
```
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[...],"usage":{"prompt_tokens":10,"completion_tokens":15,"total_tokens":25,"cost_usd_total":0.000125,"cost_usd_input":0.000025,"cost_usd_output":0.0001}}
data: [DONE]
```
## Example: Tracking Costs in Code
Here's an example of how to track costs programmatically using the cost breakdown feature:
```typescript
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.LLM_GATEWAY_API_KEY,
baseURL: "https://api.llmgateway.io/v1",
});
async function trackCosts() {
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello!" }],
});
const usage = response.usage as any;
if (usage.cost_usd_total !== undefined) {
console.log(`Request cost: $${usage.cost_usd_total.toFixed(6)}`);
console.log(` Input: $${usage.cost_usd_input.toFixed(6)}`);
console.log(` Output: $${usage.cost_usd_output.toFixed(6)}`);
if (usage.cost_usd_cached_input > 0) {
console.log(` Cached: $${usage.cost_usd_cached_input.toFixed(6)}`);
}
}
return response;
}
```
## Use Cases
### Budget Monitoring
Track costs in real-time and implement budget limits in your application:
```typescript
let totalSpent = 0;
const BUDGET_LIMIT = 10.0; // $10 budget
async function makeRequest(messages: Message[]) {
const response = await client.chat.completions.create({
model: "gpt-4o",
messages,
});
const cost = (response.usage as any).cost_usd_total || 0;
totalSpent += cost;
if (totalSpent > BUDGET_LIMIT) {
throw new Error(`Budget exceeded: $${totalSpent.toFixed(2)}`);
}
return response;
}
```
### Per-User Cost Allocation
Track costs per user for billing or analytics:
```typescript
const userCosts: Map = new Map();
async function makeRequestForUser(userId: string, messages: Message[]) {
const response = await client.chat.completions.create({
model: "gpt-4o",
messages,
});
const cost = (response.usage as any).cost_usd_total || 0;
const currentCost = userCosts.get(userId) || 0;
userCosts.set(userId, currentCost + cost);
return response;
}
```
### Cost Analytics
Aggregate costs by model, time period, or any other dimension:
```typescript
interface CostEntry {
timestamp: Date;
model: string;
inputCost: number;
outputCost: number;
totalCost: number;
}
const costLog: CostEntry[] = [];
async function loggedRequest(model: string, messages: Message[]) {
const response = await client.chat.completions.create({
model,
messages,
});
const usage = response.usage as any;
costLog.push({
timestamp: new Date(),
model: response.model,
inputCost: usage.cost_usd_input || 0,
outputCost: usage.cost_usd_output || 0,
totalCost: usage.cost_usd_total || 0,
});
return response;
}
```
## Data Storage Costs
When data retention is enabled in organization policies, LLM Gateway stores full request and response payloads for the configured retention period. This storage incurs a small additional cost:
* **Rate**: $0.01 per 1 million tokens
* **Applies to**: Input, cached, output, and reasoning tokens
* **When charged**: Only when retention level is set to "Retain All Data"
* **Billing mode**: In API keys mode, only storage costs are deducted from credits (inference costs are billed to your provider keys)
Storage costs are displayed separately from inference costs in the dashboard and usage breakdown to maintain transparency between provider costs and LLM Gateway platform costs.
Enable [auto top-up](/dashboard) in billing settings to prevent request
failures when storage costs deplete your credits.
## Self-Hosted Deployments
If you're running a self-hosted LLM Gateway deployment, cost breakdown is always included in API responses regardless of plan. This allows you to track internal costs and allocate them across teams or projects.
# Custom Providers
URL: /features/custom-providers
import { Callout } from "fumadocs-ui/components/callout";
# Custom Providers
LLMGateway supports integrating custom OpenAI-compatible providers, allowing you to use any API that follows the OpenAI chat completions format. This feature is perfect for:
* Private or self-hosted LLM deployments
* Specialized AI providers not natively supported
* Internal AI services within your organization
* Testing against different model endpoints
Custom providers must be OpenAI-compatible, supporting the
`/v1/chat/completions` endpoint format.
## Quick Setup
### 1. Add a Custom Provider Key
Navigate to your organization's provider settings and add a custom provider via the UI.
Provide a lowercase name, OpenAI-compatible base URL, and API token for the custom provider.
### 2. Make Requests
Once configured, make requests using the format `{customName}/{modelName}`:
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mycompany/custom-gpt-4",
"messages": [
{
"role": "user",
"content": "Hello from my custom provider!"
}
]
}'
```
## Configuration Requirements
### Custom Provider Name
* **Format**: Lowercase letters only (`a-z`)
* **Examples**: `mycompany`, `internal`, `testing`
* **Invalid**: `MyCompany`, `my-company`, `my_company`, `123test`
The custom provider name must match the regex pattern `/^[a-z]+$/` exactly.
### Base URL
* Must be a valid HTTPS URL
* Should point to your provider's base endpoint
* LLMGateway will append `/v1/chat/completions` automatically
* **Example**: `https://api.example.com` β `https://api.example.com/v1/chat/completions`
### API Token
* Provider-specific authentication token
* Used in the `Authorization: Bearer {token}` header
Unlike built-in providers, custom provider models are not validated, giving
you complete flexibility.
## Supported Features
Custom providers inherit full LLMGateway functionality.
# Data Retention
URL: /features/data-retention
import { Callout } from "fumadocs-ui/components/callout";
# Data Retention
LLM Gateway offers configurable data retention policies that allow you to store full request and response payloads. This enables powerful debugging capabilities, detailed analytics, and compliance with data governance requirements.
## Retention Levels
LLM Gateway supports two retention levels that can be configured per organization:
| Level | Description | Storage Cost |
| ------------------- | ---------------------------------------------------------------------------------------------- | --------------- |
| **Metadata Only** | Stores request metadata (timestamps, model, tokens, costs) without full payloads. Default. | Free |
| **Retain All Data** | Stores complete request and response payloads including messages, tool calls, and attachments. | $0.01/1M tokens |
Metadata-only retention is enabled by default and provides usage analytics
without additional storage costs.
## Storage Pricing
When full data retention is enabled, storage is billed at **$0.01 per 1 million tokens**. This rate applies to:
* Input tokens (prompt)
* Cached input tokens
* Output tokens (completion)
* Reasoning tokens
Storage costs are calculated per request and displayed in the `cost_usd_data_storage` field of the response. See [Cost Breakdown](/features/cost-breakdown) for details on tracking costs programmatically.
### Example Cost Calculation
For a request with:
* 1,000 input tokens
* 500 output tokens
* 1,500 total tokens
Storage cost = 1,500 / 1,000,000 Γ $0.01 = **$0.000015**
## Configuring Retention
Data retention is configured at the organization level in your dashboard settings:
1. Navigate to **Organization Settings** β **Policies**
2. Select your preferred **Data Retention Level**
3. Save changes
Changing retention settings applies to new requests only. Existing stored data
follows the retention period active when it was created.
## Retention Periods
Data is retained for 30 days for all users. Enterprise plans can have custom retention periods. After the retention period expires, data is automatically deleted.
## Accessing Stored Data
When data retention is enabled, you can access your stored requests through the dashboard:
* View request history with full payload inspection
* Filter by model and date range
* Inspect complete request and response payloads
## Use Cases
### Debugging
Full data retention enables you to:
* Inspect exact prompts sent to models
* Review complete responses including tool calls
* Trace conversation histories
* Identify issues in production
### Analytics
With stored payloads, you can:
* Analyze prompt patterns and effectiveness
* Track response quality over time
* Build custom dashboards and reports
* Measure model performance across use cases
### Compliance
Data retention helps meet compliance requirements by:
* Maintaining audit trails of AI interactions
* Enabling data governance policies
* Supporting incident investigation
* Providing records for regulatory requirements
## Billing Considerations
### Credit Usage
In **API keys mode** (using your own provider keys):
* Only storage costs are deducted from LLM Gateway credits
* Inference costs are billed directly to your provider
In **credits mode**:
* Both inference and storage costs are deducted from credits
### Monitoring Storage Costs
Storage costs appear in:
* The `cost_usd_data_storage` field in API responses
* Usage dashboard under "Storage" category
* Billing invoices as a separate line item
Enable [auto top-up](/dashboard) in billing settings to ensure uninterrupted
service when storage costs accumulate.
## Self-Hosted Deployments
Self-hosted deployments have full control over data retention:
* Configure retention periods in environment variables
* Data is stored in your own PostgreSQL database
* No additional storage costs (you manage your own infrastructure)
## Privacy and Security
* All stored data is encrypted at rest
* Access is restricted to organization members with appropriate permissions
* Data is automatically deleted after the retention period
* You can request immediate deletion of specific records through support
# Guardrails
URL: /features/guardrails
import { Callout } from "fumadocs-ui/components/callout";
# Guardrails
Guardrails protect your organization by automatically detecting and blocking harmful content in LLM requests before they reach the model.
Guardrails are available on the [**Enterprise
plan**](https://llmgateway.io/enterprise).
## Overview
Guardrails run on every API request, scanning message content for:
* Security threats (prompt injection, jailbreak attempts)
* Sensitive data (PII, secrets, credentials)
* Policy violations (blocked terms, restricted topics)
When a violation is detected, you control what happens: block the request, redact the content, or log a warning.
## System Rules
Built-in rules protect against common threats:
### Prompt Injection Detection
Detects attempts to override or manipulate system instructions. Common patterns include:
* "Ignore all previous instructions"
* "You are now a different AI"
* Hidden instructions in encoded text
### Jailbreak Detection
Identifies attempts to bypass safety measures:
* DAN (Do Anything Now) prompts
* Roleplay-based bypasses
* Instruction override attempts
### PII Detection
Identifies personal information:
* Email addresses
* Phone numbers
* Social Security Numbers
* Credit card numbers
* IP addresses
When the action is set to **redact**, PII is replaced with placeholders like `[EMAIL_REDACTED]`.
### Secrets Detection
Detects credentials and API keys:
* AWS access keys and secrets
* Generic API keys
* Passwords in common formats
* Private keys
### File Type Restrictions
Control which file types can be uploaded:
* Configure allowed MIME types
* Set maximum file size limits
* Block potentially dangerous file types
### Document Leakage Prevention
Detects attempts to extract confidential documents or internal data.
## Configurable Actions
For each rule, choose how to respond:
| Action | Behavior |
| ---------- | --------------------------------------------------- |
| **Block** | Reject the request with a content policy error |
| **Redact** | Remove or mask the sensitive content, then continue |
| **Warn** | Log the violation but allow the request to proceed |
## Custom Rules
Create organization-specific rules for your use case:
### Blocked Terms
Prevent specific words or phrases from being used:
* Match type: exact, contains, or regex
* Case-sensitive matching option
* Multiple terms per rule
### Custom Regex
Match patterns unique to your organization:
* Internal project codenames
* Customer identifiers
* Domain-specific sensitive data
### Topic Restrictions
Block content related to specific topics:
* Define restricted topics
* Keyword-based detection
## Security Events Dashboard
Monitor all guardrail violations with a dedicated dashboard:
* **Total violations** β Overall count and trends
* **By action** β Breakdown of blocked, redacted, and warned
* **By category** β Which rules are being triggered
* **Detailed logs** β Individual violations with timestamps and matched patterns
## How It Works
```
Request β Guardrails Check β Action Based on Rules β Forward to Model (if allowed)
β
Log Violation
```
1. **Request received** β API request comes in with messages
2. **Content scanned** β All text content is checked against enabled rules
3. **Violations detected** β Matches are identified and logged
4. **Action taken** β Based on rule configuration (block/redact/warn)
5. **Request proceeds** β If not blocked, the (potentially redacted) request continues
## Best Practices
1. **Start with warnings** β Enable rules in warn mode first to understand your traffic patterns
2. **Review violations** β Check the Security Events dashboard regularly
3. **Tune custom rules** β Adjust blocked terms and regex patterns based on false positives
4. **Layer defenses** β Use multiple rule types together for comprehensive protection
## Get Started
Guardrails are an Enterprise feature. [Contact us](https://llmgateway.io/enterprise) to enable Enterprise for your organization.
# Image Generation
URL: /features/image-generation
import { Callout } from "fumadocs-ui/components/callout";
# Image Generation
LLMGateway supports image generation through two APIs:
1. **`/v1/images/generations`** β OpenAI-compatible images endpoint (recommended for simple image generation)
2. **`/v1/images/edits`** β OpenAI-compatible image editing endpoint
3. **`/v1/chat/completions`** β Chat completions with image generation models (for conversational image generation and editing)
## Available Models
You can find all available image generation models on our [models page](https://llmgateway.io/models?filters=1\&imageGeneration=true).
## OpenAI Images API
The `/v1/images/generations` endpoint provides a drop-in replacement for OpenAI's image generation API. It works with any OpenAI-compatible client library.
### Parameters
| Parameter | Type | Default | Description |
| ----------------- | ------- | ------------ | ---------------------------------------------------------------------------------------------------------------- |
| `prompt` | string | required | A text description of the desired image(s) |
| `model` | string | `"auto"` | The model to use. `auto` resolves to `gemini-3-pro-image-preview` |
| `n` | integer | `1` | Number of images to generate (1-10) |
| `size` | string | β | Image dimensions. Supported sizes depend on the model/provider β see [Image Configuration](#image-configuration) |
| `quality` | string | β | Image quality. Supported values depend on the model/provider β see [Image Configuration](#image-configuration) |
| `response_format` | string | `"b64_json"` | Only `b64_json` is supported |
| `style` | string | β | Image style: `vivid` or `natural` |
### curl
```bash
curl -X POST "https://api.llmgateway.io/v1/images/generations" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3-pro-image-preview",
"prompt": "A cute cat wearing a tiny top hat",
"n": 1,
"size": "1024x1024"
}'
```
### OpenAI SDK
Works with the standard OpenAI client library β just point the base URL to LLMGateway.
```ts
import OpenAI from "openai";
import { writeFileSync } from "fs";
const client = new OpenAI({
baseURL: "https://api.llmgateway.io/v1",
apiKey: process.env.LLM_GATEWAY_API_KEY,
});
const response = await client.images.generate({
model: "gemini-3-pro-image-preview",
prompt: "A futuristic city skyline at sunset with flying cars",
n: 1,
size: "1024x1024",
});
response.data.forEach((image, i) => {
if (image.b64_json) {
const buf = Buffer.from(image.b64_json, "base64");
writeFileSync(`image-${i}.png`, buf);
}
});
```
### Vercel AI SDK
Use the `@llmgateway/ai-sdk-provider` with `generateImage`.
```ts
import { createLLMGateway } from "@llmgateway/ai-sdk-provider";
import { generateImage } from "ai";
import { writeFileSync } from "fs";
const llmgateway = createLLMGateway({
apiKey: process.env.LLM_GATEWAY_API_KEY,
});
const result = await generateImage({
model: llmgateway.image("gemini-3-pro-image-preview"),
prompt:
"A cozy cabin in a snowy mountain landscape at night with aurora borealis",
size: "1024x1024",
n: 1,
});
result.images.forEach((image, i) => {
const buf = Buffer.from(image.base64, "base64");
writeFileSync(`image-${i}.png`, buf);
});
```
## OpenAI Images Edit API
The `/v1/images/edits` endpoint is OpenAI-compatible and supports a focused subset of `images.edit` parameters.
### Parameters
| Parameter | Type | Required | Description |
| -------------------- | ------------------------ | -------- | ------------------------------------------------------------------ |
| `images` | array of `{ image_url }` | yes | Input images. `image_url` supports HTTPS URLs and base64 data URLs |
| `prompt` | string | yes | A text description of the desired image edit |
| `model` | string | no | Image editing model |
| `background` | enum | no | `transparent`, `opaque`, or `auto` |
| `input_fidelity` | enum | no | `high` or `low` |
| `n` | integer | no | Number of edited images to generate |
| `output_format` | enum | no | `png`, `jpeg`, or `webp` |
| `output_compression` | integer | no | Compression level for `jpeg`/`webp` |
| `quality` | enum | no | `low`, `medium`, `high`, or `auto` |
| `size` | enum | no | `auto`, `1024x1024`, `1536x1024`, `1024x1536` |
`mask` is not supported yet on `/v1/images/edits`.
### curl (HTTPS image URL)
```bash
curl -X POST "https://api.llmgateway.io/v1/images/edits" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"images": [
{
"image_url": "https://example.com/source-image.png"
}
],
"prompt": "Add a watercolor effect to this image",
"model": "gemini-3-pro-image-preview",
"quality": "high",
"size": "1024x1024"
}'
```
### curl (base64 data URL)
```bash
curl -X POST "https://api.llmgateway.io/v1/images/edits" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"images": [
{
"image_url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."
}
],
"prompt": "Turn this into a pixel-art style image"
}'
```
## Chat Completions API
Image generation also works through the `/v1/chat/completions` endpoint, which is useful for conversational image generation, image editing with vision, and multi-turn interactions.
### Making Requests
Simply use an image generation model and provide a text prompt describing the image you want to create.
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3-pro-image-preview",
"messages": [
{
"role": "user",
"content": "Generate an image of a cute golden retriever puppy playing in a sunny meadow"
}
]
}'
```
### Response Format
Image generation models return responses in the standard chat completions format, with generated images included in the `images` array within the assistant message:
```json
{
"id": "chatcmpl-1756234109285",
"object": "chat.completion",
"created": 1756234109,
"model": "gemini-3-pro-image-preview",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here's an image of a cute dog for you: ",
"images": [
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,"
}
}
]
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 1303,
"total_tokens": 1311
}
}
```
### Vision support
You can edit or modify images by combining image generation with [vision models](/features/vision) by including the image in the `messages` array.
### Response Structure
#### Images Array
The `images` array contains one or more generated images with the following structure:
* `type`: Always `"image_url"` for generated images
* `image_url.url`: A data URL containing the base64-encoded image data (format: `data:image/png;base64,`)
#### Content Field
The `content` field may contain descriptive text about the generated image, depending on the model's behavior.
### AI SDK (Chat Completions)
You can use the AI SDK to generate images with your existing generateText or streamText calls using the LLMGateway provider.
#### Example
```ts title="/api/chat/route.ts"
import { streamText, type UIMessage, convertToModelMessages } from "ai";
import { createLLMGateway } from "@llmgateway/ai-sdk-provider";
interface ChatRequestBody {
messages: UIMessage[];
}
export async function POST(req: Request) {
const body = await req.json();
const { messages }: ChatRequestBody = body;
const llmgateway = createLLMGateway({
apiKey: "llmgateway_api_key",
baseUrl: "https://api.llmgateway.io/v1",
});
try {
const result = streamText({
model: llmgateway.chat("gemini-3-pro-image-preview"),
messages: convertToModelMessages(messages),
});
return result.toUIMessageStreamResponse();
} catch {
return new Response(
JSON.stringify({ error: "LLM Gateway Chat request failed" }),
{
status: 500,
},
);
}
}
```
Then you can render the image in your frontend using the `Image` component from the [ai-elements](https://ai-sdk.dev/elements/components/image).
Here is a full example of how to use the AI SDK to generate images in your frontend:
```tsx title="/app/page.tsx"
"use client";
import { useState, useRef } from "react";
import { useChat } from "@ai-sdk/react";
import { parseImagePartToDataUrl } from "@/lib/image-utils";
import {
PromptInput,
PromptInputBody,
PromptInputButton,
PromptInputSubmit,
PromptInputTextarea,
PromptInputToolbar,
} from "@/components/ai-elements/prompt-input";
import {
Conversation,
ConversationContent,
} from "@/components/ai-elements/conversation";
import { Image } from "@/components/ai-elements/image";
import { Loader } from "@/components/ai-elements/loader";
import { Message, MessageContent } from "@/components/ai-elements/message";
import { Response } from "@/components/ai-elements/response";
export const ChatUI = () => {
const textareaRef = useRef(null);
const [text, setText] = useState("");
const { messages, status, stop, regenerate, sendMessage } = useChat();
return (
<>
>
);
};
```
```ts title="/lib/image-utils.ts"
/**
* Parses a file object containing image data and returns a properly formatted data URL
* and normalized media type.
*
* Handles:
* - Normalizing mediaType from various property names (mediaType, mime_type)
* - Detecting existing data: URLs
* - Detecting base64-looking content
* - Stripping whitespace from base64 content
* - Building proper data:...;base64,... URLs
*/
export function parseImageFile(file: {
url?: string;
mediaType?: string;
mime_type?: string;
}): { dataUrl: string; mediaType: string } {
const mediaType = file.mediaType || file.mime_type || "image/png";
let url = String(file.url || "");
const isDataUrl = url.startsWith("data:");
const looksLikeBase64 =
!isDataUrl && /^[A-Za-z0-9+/=\s]+$/.test(url.slice(0, 200));
if (looksLikeBase64) {
url = url.replace(/\s+/g, "");
}
const dataUrl = isDataUrl
? url
: looksLikeBase64
? `data:${mediaType};base64,${url}`
: url;
return { dataUrl, mediaType };
}
/**
* Extracts base64-only content from a data URL.
* Returns empty string if the input is not a valid data URL.
*/
export function extractBase64FromDataUrl(dataUrl: string): string {
if (!dataUrl.startsWith("data:")) {
return "";
}
const comma = dataUrl.indexOf(",");
return comma >= 0 ? dataUrl.slice(comma + 1) : "";
}
/**
* Parses an image part (either image_url or file type) and returns
* dataUrl, base64Only, and mediaType ready for rendering.
*
* Handles error cases gracefully by returning empty base64Only string
* when parsing fails, allowing the renderer to skip invalid images.
*/
export function parseImagePartToDataUrl(part: any): {
dataUrl: string;
base64Only: string;
mediaType: string;
} {
try {
// Handle image_url parts
if (part.type === "image_url" && part.image_url?.url) {
const url = part.image_url.url;
const mediaType = "image/png"; // Default for image_url parts
if (url.startsWith("data:")) {
// Extract media type from data URL if present
const match = url.match(/data:([^;]+)/);
const extractedMediaType = match?.[1] || mediaType;
return {
dataUrl: url,
base64Only: extractBase64FromDataUrl(url),
mediaType: extractedMediaType,
};
}
return {
dataUrl: url,
base64Only: "",
mediaType,
};
}
// Handle file parts (AI SDK format)
if (part.type === "file") {
const { dataUrl, mediaType } = parseImageFile(part);
return {
dataUrl,
base64Only: extractBase64FromDataUrl(dataUrl),
mediaType,
};
}
return {
dataUrl: "",
base64Only: "",
mediaType: "image/png",
};
} catch {
return {
dataUrl: "",
base64Only: "",
mediaType: "image/png",
};
}
}
```
## Image Configuration
You can customize the generated image using the optional `image_config` parameter (for chat completions) or `size`/`quality`/`style` parameters (for the images API). The supported parameters vary by provider.
### Google Models
Available Google models:
| Model | Description |
| -------------------------------- | ----------------------------------------------------------------------------------- |
| `gemini-3-pro-image-preview` | Gemini 3 Pro with native image generation. Supports aspect ratios and 1Kβ4K sizes. |
| `gemini-3.1-flash-image-preview` | Gemini 3.1 Flash with native image generation. Supports 0.5Kβ4K sizes (default 1K). |
#### gemini-3-pro-image-preview
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3-pro-image-preview",
"messages": [
{
"role": "user",
"content": "Generate an image of a mountain landscape at sunset"
}
],
"image_config": {
"aspect_ratio": "16:9",
"image_size": "4K"
}
}'
```
| Parameter | Type | Description |
| -------------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------------- |
| `aspect_ratio` | string | The aspect ratio of the generated image. Options: `"1:1"`, `"2:3"`, `"3:2"`, `"3:4"`, `"4:3"`, `"4:5"`, `"5:4"`, `"9:16"`, `"16:9"`, `"21:9"` |
| `image_size` | string | The resolution of the generated image. Options: `"1K"` (1024x1024), `"2K"` (2048x2048), `"4K"` (4096x4096) |
#### gemini-3.1-flash-image-preview
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3.1-flash-image-preview",
"messages": [
{
"role": "user",
"content": "Generate an image of a mountain landscape at sunset"
}
],
"image_config": {
"image_size": "1K"
}
}'
```
| Parameter | Type | Description |
| -------------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `aspect_ratio` | string | The aspect ratio of the generated image. Options: `"1:1"`, `"1:4"`, `"1:8"`, `"2:3"`, `"3:2"`, `"3:4"`, `"4:1"`, `"4:3"`, `"4:5"`, `"5:4"`, `"8:1"`, `"9:16"`, `"16:9"`, `"21:9"` |
| `image_size` | string | The resolution of the generated image. Options: `"0.5K"` (512x512), `"1K"` (1024x1024, default), `"2K"` (2048x2048), `"4K"` (4096x4096) |
`gemini-3.1-flash-image-preview` uniquely supports `"0.5K"` resolution, which
is not available on other Google image models.
### Alibaba Models
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "alibaba/qwen-image-plus",
"messages": [
{
"role": "user",
"content": "Generate an image of a mountain landscape at sunset"
}
],
"image_config": {
"image_size": "1024x1536",
"n": 1,
"seed": 42
}
}'
```
| Parameter | Type | Description |
| ------------ | ------- | ------------------------------------------------------------------------------------------------ |
| `image_size` | string | Image dimensions in `WIDTHxHEIGHT` format. Examples: `"1024x1024"`, `"1024x1536"`, `"1536x1024"` |
| `n` | integer | Number of images to generate (1-4) |
| `seed` | integer | Random seed for reproducible generation |
Available Alibaba models:
| Model | Price | Description |
| ------------------------- | ------------ | --------------------------------- |
| `alibaba/qwen-image` | $0.035/image | Standard quality image generation |
| `alibaba/qwen-image-plus` | $0.03/image | Good balance of quality and cost |
| `alibaba/qwen-image-max` | $0.075/image | Highest quality image generation |
Alibaba models use explicit pixel dimensions (e.g., `"1024x1536"`) instead of
aspect ratios. For portrait orientation use `"1024x1536"`, for landscape use
`"1536x1024"`.
### Z.AI Models
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zai/cogview-4",
"messages": [
{
"role": "user",
"content": "Generate an image of a futuristic city skyline"
}
],
"image_config": {
"image_size": "1024x1024"
}
}'
```
| Parameter | Type | Description |
| ------------ | ------- | ------------------------------------------------------------------------------------------------ |
| `image_size` | string | Image dimensions in `WIDTHxHEIGHT` format. Examples: `"1024x1024"`, `"2048x1024"`, `"1024x2048"` |
| `n` | integer | Number of images to generate |
Available Z.AI models:
| Model | Price | Description |
| --------------- | ------------ | ------------------------------------------------------------------------------------------------------------------- |
| `zai/cogview-4` | $0.01/image | CogView-4 with bilingual support and excellent text rendering |
| `zai/glm-image` | $0.015/image | GLM-Image with hybrid auto-regressive architecture, excellent for text-rendering and knowledge-intensive generation |
CogView-4 supports both Chinese and English prompts and excels at generating
images with embedded text.
### ByteDance Models
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "bytedance/seedream-4-5",
"messages": [
{
"role": "user",
"content": "Generate an image of a futuristic cyberpunk city at night"
}
],
"image_config": {
"image_size": "2048x2048"
}
}'
```
| Parameter | Type | Description |
| ------------ | ------ | ------------------------------------------------------------------------------------------------ |
| `image_size` | string | Image dimensions in `WIDTHxHEIGHT` format. Examples: `"1024x1024"`, `"2048x2048"`, `"4096x4096"` |
Available ByteDance models:
| Model | Price | Description |
| ------------------------ | ------------ | --------------------------------------------------------------- |
| `bytedance/seedream-4-0` | $0.035/image | High-quality text-to-image generation with 2K default output |
| `bytedance/seedream-4-5` | $0.045/image | Enhanced quality and consistency with improved prompt adherence |
Seedream models support up to 2-10 reference images for multi-image fusion and
generation. The default output resolution is 2048Γ2048 (2K), with support up
to 4096Γ4096 (4K).
## Usage Notes
Image generation models typically have higher token costs compared to
text-only models due to the computational requirements of image synthesis.
Generated images are returned as base64-encoded data URLs, which can be large.
Consider the payload size when integrating image generation into your
applications.
# Metadata
URL: /features/metadata
# Metadata
LLM Gateway supports sending additional metadata with your requests using custom headers. This allows you to include information like user sessions, application versions, tenant IDs, or other contextual data that can be useful for analytics and monitoring.
Later, you can filter by specific values to return, such as for a specific user or session. Additionally, in the future, you will be able to segment your analytics and monitoring based on this metadata. For example, you could show cost and latency breakdowns per user, application, country, feature, or any other dimension you want to track.
## Custom Headers
You can include custom headers with the `X-LLMGateway-` prefix to send metadata alongside your LLM requests:
```bash
curl -X POST https://api.llmgateway.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "X-LLMGateway-Country: US" \
-H "X-LLMGateway-User-ID: 9403f741-a524-4b18-b1b2-dbb71cdff2a4" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
]
}'
```
## Best Practices
### Header Naming
* Use the `X-LLMGateway-` prefix for all custom metadata
* Use descriptive, consistent naming conventions
* Avoid special characters; use hyphens to separate words
### Data Privacy
* Be mindful of sensitive data in headers
* Consider hashing or anonymizing user identifiers
* Follow your organization's data privacy policies
### Performance
* Keep header values reasonably short
* Avoid sending unnecessary metadata that won't be used for analytics
* Consider the impact on request size, especially for high-volume applications
## Example: Multi-tenant Application
For a multi-tenant application, you might use metadata headers like this:
```bash
curl -X POST https://api.llmgateway.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "X-LLMGateway-Tenant-ID: acme-corp" \
-H "X-LLMGateway-User-ID: user-12345" \
-H "X-LLMGateway-App-Version: 2.1.4" \
-H "X-LLMGateway-Feature: chat-assistant" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Summarize this document..."
}
]
}'
```
This allows you to track usage and costs per tenant, user, application version, and feature, providing detailed insights into how your LLM integration is being used across your platform.
# Reasoning
URL: /features/reasoning
import { Callout } from "fumadocs-ui/components/callout";
# Reasoning
LLMGateway supports reasoning-capable models that can show their step-by-step thought process before providing a final answer. This feature is particularly useful for complex problem-solving tasks, mathematical calculations, and logical reasoning.
## Reasoning-Enabled Models
You can find all reasoning-enabled models on our [models page with reasoning filter](https://llmgateway.io/models?filters=1\&reasoning=true). These models include:
* OpenAI's GPT-5 series (e.g., `gpt-5`, `gpt-5-mini`)
* Note: GPT-5 models use reasoning but currently do not return the reasoning content in the response.
* Anthropic's Claude 3.7 Sonnet
* Google's Gemini 2.0 Flash Thinking and Gemini 2.5 Pro
* GPT OSS models such as `gpt-oss-120b` and `gpt-oss-20b`
* Z.AI's reasoning models
Some models may reason internally even if the `reasoning_effort` parameter is
not specified.
## Using the Reasoning Parameter
There are two ways to control reasoning effort:
### Option 1: Top-level `reasoning_effort`
Add the `reasoning_effort` parameter directly to your request:
* `minimal` - Fastest reasoning with minimal thought process (only for GPT-5 models)
* `low` - Light reasoning for simpler tasks
* `medium` - Balanced reasoning for most tasks
* `high` - Deep reasoning for complex problems
* `xhigh` - Maximum reasoning depth for the most complex problems
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss-120b",
"messages": [
{
"role": "user",
"content": "What is 2/3 + 1/4 + 5/6?"
}
],
"reasoning_effort": "medium"
}'
```
### Option 2: Using the `reasoning` object
Use the unified `reasoning` configuration object with an `effort` field:
* `none` - Disable reasoning
* `minimal` - Fastest reasoning with minimal thought process
* `low` - Light reasoning for simpler tasks
* `medium` - Balanced reasoning for most tasks
* `high` - Deep reasoning for complex problems
* `xhigh` - Maximum reasoning depth for the most complex problems
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5",
"messages": [
{
"role": "user",
"content": "What is 2/3 + 1/4 + 5/6?"
}
],
"reasoning": {
"effort": "medium"
}
}'
```
You cannot use both `reasoning_effort` and `reasoning.effort` in the same
request. Choose one approach. However, you can combine `reasoning_effort` or
`reasoning.effort` with `reasoning.max_tokens` β when `max_tokens` is
specified, it takes priority over the effort level.
### Example Response
The response will include a `reasoning` field in the message object containing the model's step-by-step thought process:
```json
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "gpt-oss-120b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The answer is 1.75 or 7/4.",
"reasoning": "First, I need to find a common denominator for 2/3, 1/4, and 5/6. The LCD is 12. Converting: 2/3 = 8/12, 1/4 = 3/12, 5/6 = 10/12. Adding: 8/12 + 3/12 + 10/12 = 21/12 = 1.75 or 7/4."
},
"finish_reason": "completed"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 45,
"reasoning_tokens": 35,
"total_tokens": 65
}
}
```
## Specifying Reasoning Token Budget
For models that support it, you can specify an exact token budget for reasoning using the `reasoning` object with `max_tokens`. This gives you precise control over how many tokens the model allocates to its thinking process.
When `reasoning.max_tokens` is specified, it overrides `reasoning.effort` and
`reasoning_effort`. Supported by Anthropic Claude and Google Gemini thinking
models.
### Example Request
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-20250514",
"messages": [
{
"role": "user",
"content": "Explain the P vs NP problem and why it matters."
}
],
"reasoning": {
"max_tokens": 8000
}
}'
```
### Supported Models
The `reasoning.max_tokens` parameter is supported by:
* **Anthropic Claude**: Claude 3.7 Sonnet, Claude Sonnet 4, Claude Opus 4, Claude Opus 4.5
* **Google Gemini**: Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 3 Pro Preview
When using auto-routing or root models with `reasoning.max_tokens`, only providers that support this feature will be considered.
### Provider-Specific Constraints
* **Anthropic**: Reasoning budget must be between 1,024 and 128,000 tokens. Values outside this range are automatically clamped.
* **Google**: No specific constraints on the reasoning budget.
### Error Handling
If you specify `reasoning.max_tokens` for a model that doesn't support it, you'll receive an error:
```json
{
"error": {
"message": "Model gpt-4o does not support reasoning.max_tokens. Remove the reasoning parameter or use a model that supports explicit reasoning token budgets.",
"type": "invalid_request_error",
"code": "model_not_supported"
}
}
```
## Streaming Reasoning Content
When streaming is enabled, reasoning content will be streamed as part of the response chunks:
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss-120b",
"messages": [
{
"role": "user",
"content": "Solve this logic puzzle: If all roses are flowers and some flowers fade quickly, can we conclude that some roses fade quickly?"
}
],
"reasoning_effort": "high",
"stream": true
}'
```
The reasoning content will appear in the stream chunks before the final answer, allowing you to display the model's thought process in real-time.
Example:
```
data: {
"id": "chatcmpl-fb266880-1016-4797-9a70-f21a538edaf6",
"object": "chat.completion.chunk",
"created": 1761048126,
"model": "openai/gpt-oss-20b",
"choices": [
{
"index": 0,
"delta": {
"reasoning": "It's ",
"role": "assistant"
},
"finish_reason": null
}
]
}
```
## Usage Tracking
### Response Payload
The `usage` object in the response includes reasoning-specific token counts:
* `reasoning_tokens` - Number of tokens used for the reasoning process
* `completion_tokens` - Number of tokens in the final answer
* `prompt_tokens` - Number of tokens in the input
* `total_tokens` - Sum of all token counts
### Logs and Analytics
All requests using the `reasoning_effort` parameter are tracked in your dashboard logs with:
* The `reasoningContent` field containing the full reasoning text
* Separate token counts for reasoning vs. completion
* Performance metrics for reasoning-enabled requests
You can view detailed logs for each request in the [dashboard](https://llmgateway.io/dashboard) to analyze how models are reasoning through problems.
## Auto-Routing with Reasoning
When using auto-routing (specifying a model like `gpt-5` without a specific version), LLMGateway will:
1. Automatically set `reasoning_effort` to `minimal` for GPT-5 models
2. Set `reasoning_effort` to `low` for other auto-routed reasoning models
3. Only route to providers that support reasoning when `reasoning_effort` is specified
This ensures optimal performance and cost when using auto-routing with reasoning-capable models.
## Model-Specific Behavior
Not all reasoning models return reasoning content in the same way. Some models (like OpenAI models) may reason internally but not expose the reasoning content in the response. LLMGateway makes sure the response is unified across different providers, but the depth and format of reasoning may vary.
## Best Practices
1. **Choose appropriate reasoning effort**: Use `low` or `minimal` for simple tasks, `medium` for most tasks, and `high` only for complex problems that require deep reasoning
2. **Monitor token usage**: Reasoning can significantly increase token consumption - monitor your `reasoning_tokens` in the usage object
3. **Stream for better UX**: When building user-facing applications, enable streaming to show the reasoning process in real-time
4. **Check logs**: Review the `reasoningContent` in your dashboard logs to understand how models are solving problems
## Error Handling
If you specify `reasoning_effort` for a model that doesn't support reasoning, you'll receive an error:
```json
{
"error": {
"message": "Model gpt-4o does not support reasoning. Remove the reasoning_effort parameter or use a reasoning-capable model.",
"type": "invalid_request_error",
"code": "model_not_supported"
}
}
```
To avoid this error, only use the `reasoning_effort` parameter with [reasoning-enabled models](https://llmgateway.io/models?filters=1\&reasoning=true).
# Response Healing
URL: /features/response-healing
import { Callout } from "fumadocs-ui/components/callout";
# Response Healing
Response Healing is a plugin that automatically validates and repairs malformed JSON responses from AI models. When enabled, LLM Gateway ensures that API responses conform to your specified schemas even when the model's formatting is imperfect.
## Why Response Healing?
Large language models occasionally produce invalid JSON, especially in complex scenarios:
* **Markdown wrapping**: Models often wrap JSON in code blocks like \`\`\`json...\`\`\`
* **Mixed content**: JSON may be preceded or followed by explanatory text
* **Syntax errors**: Trailing commas, unquoted keys, or single quotes instead of double quotes
* **Truncated output**: Token limits may cut off responses mid-JSON
Response Healing automatically detects and fixes these issues, saving you from implementing error handling for every possible malformed response.
## Enabling Response Healing
To enable Response Healing, add `response-healing` to the `plugins` array in your request:
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Return a JSON object with name and age"}],
"response_format": {"type": "json_object"},
"plugins": [{"id": "response-healing"}]
}'
```
Response Healing only activates when `response_format` is set to `json_object`
or `json_schema`. For regular text responses, the plugin has no effect.
## How It Works
When Response Healing is enabled, LLM Gateway applies a series of repair strategies to malformed JSON responses:
### 1. Markdown Extraction
Extracts JSON from markdown code blocks:
```text
Here's the data:
\`\`\`json
{"name": "Alice", "age": 30}
\`\`\`
```
Becomes:
```json
{ "name": "Alice", "age": 30 }
```
### 2. Mixed Content Extraction
Separates JSON from surrounding text:
```text
Sure! Here is the JSON you requested: {"name": "Alice", "age": 30} Let me know if you need anything else.
```
Becomes:
```json
{ "name": "Alice", "age": 30 }
```
### 3. Syntax Fixes
Repairs common JSON syntax violations:
| Issue | Before | After |
| --------------- | ------------------- | ------------------- |
| Trailing commas | `{"a": 1,}` | `{"a": 1}` |
| Unquoted keys | `{name: "Alice"}` | `{"name": "Alice"}` |
| Single quotes | `{'name': 'Alice'}` | `{"name": "Alice"}` |
### 4. Truncation Completion
Adds missing closing brackets for truncated responses:
```text
{"name": "Alice", "data": {"nested": true
```
Becomes:
```json
{ "name": "Alice", "data": { "nested": true } }
```
## Usage Examples
### With JSON Object Format
Request a structured response with automatic healing:
```typescript
const response = await fetch("https://api.llmgateway.io/v1/chat/completions", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gpt-4o",
messages: [
{
role: "user",
content:
"Return a JSON object with fields: name (string) and age (number)",
},
],
response_format: { type: "json_object" },
plugins: [{ id: "response-healing" }],
}),
});
const result = await response.json();
// Response is guaranteed to be valid JSON
const data = JSON.parse(result.choices[0].message.content);
```
### With JSON Schema
For stricter validation, combine with `json_schema`:
```typescript
const response = await fetch("https://api.llmgateway.io/v1/chat/completions", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gpt-4o",
messages: [
{
role: "user",
content: "Generate a user profile",
},
],
response_format: {
type: "json_schema",
json_schema: {
name: "user_profile",
schema: {
type: "object",
required: ["name", "email"],
properties: {
name: { type: "string" },
email: { type: "string" },
age: { type: "number" },
},
},
},
},
plugins: [{ id: "response-healing" }],
}),
});
const result = await response.json();
```
## Healing Metadata
When a response is healed, the healing method is logged for debugging. The following healing methods may be applied:
| Method | Description |
| -------------------------- | ------------------------------------------- |
| `markdown_extraction` | JSON extracted from markdown code blocks |
| `mixed_content_extraction` | JSON extracted from surrounding text |
| `syntax_fix` | Trailing commas, quotes, or keys were fixed |
| `truncation_completion` | Missing closing brackets were added |
| `combined_strategies` | Multiple strategies were applied |
## Limitations
Response Healing is only available for non-streaming requests. Streaming
responses are returned as-is without healing.
Response Healing works best for:
* Simple to moderately complex JSON structures
* Common formatting issues from LLMs
It may not be able to repair:
* Severely corrupted or nonsensical output
* Complex nested structures with multiple issues
* Responses that don't contain any recognizable JSON
## Best Practices
### Use with Structured Prompts
Combine Response Healing with clear instructions for best results:
```typescript
const response = await fetch("https://api.llmgateway.io/v1/chat/completions", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gpt-4o",
messages: [
{
role: "system",
content: "Always respond with valid JSON. No explanations.",
},
{
role: "user",
content: "List three colors as a JSON array",
},
],
response_format: { type: "json_object" },
plugins: [{ id: "response-healing" }],
}),
});
const result = await response.json();
```
### Validate Critical Data
For critical applications, validate the healed JSON in your code:
```typescript
const result = await response.json();
const content = result.choices[0].message.content;
const data = JSON.parse(content);
// Add your own validation
if (!data.name || typeof data.name !== "string") {
throw new Error("Invalid response: missing name");
}
```
### Monitor Healing Rates
If you notice frequent healing in your logs, consider:
* Improving your prompts to request cleaner JSON
* Using models with better JSON output (e.g., GPT-4o, Claude 3.5)
* Adding explicit JSON examples in your prompts
# Routing
URL: /features/routing
import { Callout } from "fumadocs-ui/components/callout";
# Routing
LLMGateway provides flexible and intelligent routing options to help you get the best performance and cost efficiency from your AI applications. Whether you want to use specific models, providers, or let our system automatically optimize your requests, we've got you covered.
LLMGateway also includes **automatic retry and fallback** β if a provider fails, your request is seamlessly retried on the next best provider, all within the same API call.
## Model Selection
### Any Model Name
You can use any model name from our [models page](https://llmgateway.io/models) or discover available models programmatically through the [/v1/models endpoint](/v1_models).
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
### Model ID Routing
Choose a specific model ID to route to the **best available provider** for that model. LLMGateway's smart routing algorithm considers multiple factors to find the optimal provider across all configured options.
#### Smart Routing Algorithm
When you use a model ID without a provider prefix, LLMGateway's intelligent routing system analyzes multiple factors to select the best provider:
**Weighted Scoring System** (based on last 5 minutes of metrics):
* **Uptime (50%)** - Prioritizes providers with high reliability and low error rates
* **Throughput (20%)** - Favors providers with higher tokens per second generation speed
* **Price (20%)** - Considers cost efficiency while maintaining quality
* **Latency (10%)** - Considers time to first token (only applied for streaming requests)
The algorithm calculates a weighted score for each available provider and selects the one with the lowest (best) score. All metrics are normalized to ensure fair comparison across providers.
**Latency Weight for Non-Streaming Requests**:
For non-streaming requests, the latency weight (10%) is redistributed proportionally to the other factors since time-to-first-token is less relevant when waiting for the complete response.
**Exponential Uptime Penalty**:
Providers with uptime below 95% receive an additional exponential penalty that increases rapidly as uptime drops:
* 95-100% uptime: No penalty
* 90% uptime: \~0.07 penalty
* 80% uptime: \~0.62 penalty
* 70% uptime: \~1.73 penalty
* 50% uptime: \~5.61 penalty
This ensures providers experiencing significant issues are strongly deprioritized while minor fluctuations have minimal impact.
**Epsilon-Greedy Exploration** (1% of requests):
To solve the "cold start problem" where new or unused providers never get traffic to build up metrics, the system randomly explores different providers 1% of the time. This ensures:
* All providers periodically receive traffic
* New providers can prove their reliability
* The system adapts to changing provider performance
* You benefit from improved routing decisions over time
**Routing Metadata**:
Every request includes detailed routing metadata in the logs, showing:
* Available providers that were considered
* Selected provider and selection reason
* Scores for each provider (including uptime, throughput, latency, and price)
This transparency allows you to understand and debug routing decisions.
Using model IDs without a provider prefix automatically routes to the optimal
provider based on reliability, speed, and cost. The system continuously learns
and adapts based on real-time performance metrics.
Smart routing prioritizes reliability over cost, ensuring your requests are
routed to providers with proven uptime and performance, while still
considering cost efficiency.
### Provider-Specific Routing
To use a specific provider without any fallbacks, prefix the model name with the provider name followed by a slash:
```bash
# Use OpenAI specifically
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Use DeepSeek provider specifically
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-v3.2",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
#### Low-Uptime Protection
When you specify a provider explicitly, LLMGateway checks the provider's recent uptime (last 5 minutes). If the uptime falls below 90%, the system automatically routes your request to the best available alternative provider to ensure reliability. This protects your application from providers experiencing temporary issues.
If the requested provider has low uptime but no alternative providers are
available for that model, the request will still be sent to the originally
requested provider.
#### Disabling Fallback with X-No-Fallback Header
If you need to bypass this protection and always use the exact provider you specified regardless of its current uptime, you can use the `X-No-Fallback` header:
```bash
# Force use of a specific provider even if it has low uptime
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-H "X-No-Fallback: true" \
-d '{
"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
Using `X-No-Fallback: true` disables automatic provider failover. Your
requests will be sent to the specified provider even if it is experiencing
issues, which may result in higher error rates.
When the `X-No-Fallback` header is used, the routing metadata in logs will include `noFallback: true` to indicate that fallback was disabled for that request.
## Automatic Retry & Fallback
When using model ID routing (without a provider prefix), LLMGateway automatically retries failed requests on alternate providers. This happens transparently within the same API call β your application receives the successful response as if nothing went wrong.
### How Retry Works
1. Your request is routed to the best available provider using the smart routing algorithm
2. If that provider returns a server error (5xx), times out, or has a connection failure, the gateway marks the provider as failed
3. The next best available provider is selected and the request is retried
4. Up to **2 retries** are attempted before returning an error to the client
```
Request β Provider A (500 error) β Provider B (200 OK) β Response
```
Both streaming and non-streaming requests support automatic retry.
### What Triggers a Retry
Retries are triggered by **server-side failures** only:
* **5xx errors** (500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, etc.)
* **Timeouts** (upstream provider took too long to respond)
* **Connection failures** (network errors, DNS failures, etc.)
Retries are **not** triggered by:
* **4xx client errors** (400 Bad Request, 401 Unauthorized, 403 Forbidden, 422 Unprocessable Entity)
* **Content filter responses** (Azure ResponsibleAI, etc.)
### When Retry Is Disabled
Automatic retry is disabled when:
* The `X-No-Fallback: true` header is set
* A specific provider is requested (e.g., `openai/gpt-4o`)
* No alternative providers are available for the requested model
* The maximum retry count (2) has been exhausted
### Routing Transparency
Every provider attempt β both failed and successful β is recorded in the `routing` array in the response metadata and activity logs:
```json
{
"metadata": {
"routing": [
{
"provider": "openai",
"model": "gpt-4o",
"status_code": 500,
"error_type": "server_error",
"succeeded": false
},
{
"provider": "azure",
"model": "gpt-4o",
"status_code": 200,
"error_type": "none",
"succeeded": true
}
]
}
}
```
### Retried Log Tracking
Each provider attempt creates its own log entry. Failed attempts that were retried are marked with:
* **`retried: true`** β indicates this failed request was retried on another provider
* **`retriedByLogId`** β the ID of the final successful log entry
This allows you to distinguish between unrecovered failures and failures that were transparently recovered via retry. In the dashboard, retried logs display a "Retried" badge with a link to the successful log.
### Impact on Provider Health
Failed attempts still count against the provider's uptime score, even when the request was successfully retried on another provider. This means:
* A provider that keeps failing will see its uptime score drop
* The exponential uptime penalty kicks in below 95% (see [Smart Routing Algorithm](#smart-routing-algorithm))
* Future requests are automatically routed away from unreliable providers
* Your application stays reliable without any code changes on your side
Automatic retry and fallback works together with smart routing to provide
self-healing behavior. Failing providers are automatically avoided, and your
requests are transparently recovered on reliable alternatives.
## Optimized Auto Routing
Auto routing automatically selects the best model for your specific use case without you having to specify a model at all.
### Current Implementation
The auto routing system currently:
* **Chooses cost-effective models** by default for optimal price-to-performance ratio
* **Automatically scales to more powerful models** based on your request's context size
* **Handles large contexts intelligently** by selecting models with appropriate context windows
```bash
# Let LLMGateway choose the optimal model
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Your request here..."}]
}'
```
### Free Models Only
When using auto routing, you can restrict the selection to only free models (models with zero input and output pricing) by setting the `free_models_only` parameter to `true`:
```bash
# Auto route to free models only
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Hello!"}],
"free_models_only": true
}'
```
Adding even a small amount of credits to your account (e.g., $5) will
immediately upgrade your free model rate limits from 5 requests per 10 minutes
to 20 requests per minute.
The `free_models_only` parameter only works with auto routing (`"model":
"auto"`). If no free models are available that meet your request requirements,
the API will return an error.
### Reasoning models only
Just specify the `reasoning_effort` value and only a model which supports reasoning will be chosen. This parameter is not specific to the auto model.
```bash
# Auto route only to reasoning models
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Hello!"}],
"reasoning_effort": "medium"
}'
```
### Exclude Reasoning Models
When using auto routing, you can exclude reasoning models from selection by setting the `no_reasoning` parameter to `true`. This is useful when you want faster responses or need to avoid the additional cost and latency of reasoning models:
```bash
# Auto route excluding reasoning models
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Hello!"}],
"no_reasoning": true
}'
```
The `no_reasoning` parameter only works with auto routing (`"model": "auto"`).
If no non-reasoning models are available that meet your request requirements,
the API will return an error.
Auto routing analyzes your payload and automatically chooses between
cost-effective models for simple requests and more powerful models for complex
or large-context requests.
### Coming Soon: Advanced Optimization
We're continuously improving our auto routing capabilities. Soon you'll benefit from:
* **Tool call optimization**: Automatically select models that excel at function calling and structured outputs
* **Content-aware routing**: Analyze message content to determine the best model for specific types of requests (coding, creative writing, analysis, etc.)
* **Performance-based routing**: Route based on historical performance data for similar requests
* **Multi-model orchestration**: Intelligently combine multiple models for complex workflows
### How It Works
1. **Request Analysis**: The system analyzes your request including message content, context size, and any special parameters
2. **Model Selection**: Based on the analysis, it selects the most appropriate model considering cost, performance, and capabilities
3. **Transparent Routing**: Your request is seamlessly routed to the chosen model and provider
4. **Optimized Response**: You receive the best possible response while maintaining cost efficiency
Auto routing decisions are transparent in your usage logs, so you can always
see which model was selected for each request.
## Best Practices
### For Development
* Use specific model names during development and testing
* Leverage auto routing for production workloads to optimize costs
### For Production
* Use auto routing (`"model": "auto"`) for the best balance of cost and performance
* Monitor your usage patterns through the dashboard to understand routing decisions
* Set up provider keys for multiple providers to maximize routing options
### For Cost Optimization
* Let auto routing handle model selection to automatically use the most cost-effective options
* Use model IDs without provider prefixes to always get the cheapest available provider
* Monitor your usage analytics to track cost savings from intelligent routing
# Source Attribution
URL: /features/source
# Source Attribution
The `X-Source` header allows you to identify your domain when making requests to LLM Gateway. This information is used to generate public usage statistics showing how LLM Gateway is being used across different websites and applications.
## X-Source Header
Include the `X-Source` header with your domain name in your requests:
```bash
curl -X POST https://api.llmgateway.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "X-Source: example.com" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
]
}'
```
## Domain Format
The `X-Source` header accepts domain names in various formats. All of the following are valid and will be normalized to the same domain:
* `example.com`
* `https://example.com`
* `https://www.example.com`
* `www.example.com`
All variations will be stripped down to the base domain (`example.com`) for aggregation purposes.
## Public Statistics
Data from the `X-Source` header is used to generate public statistics about LLM Gateway usage, including:
* **Popular Domains**: Which websites and applications are using LLM Gateway most frequently
* **Model Usage**: What models are being used by different domains
* **Geographic Distribution**: Where requests are coming from across different sources
* **Growth Trends**: How usage is growing over time for different domains
These statistics help demonstrate the adoption and impact of LLM Gateway across the ecosystem.
## Privacy Considerations
### What's Public
* Domain names (stripped of protocol and www prefixes)
* Aggregated request counts and model usage
* General geographic regions (country-level data)
### What's Private
* Individual request content or responses
* User identifiers or personal information
* Detailed usage patterns beyond aggregated counts
* API keys or authentication details
## Benefits
Including the `X-Source` header provides several benefits:
### For Your Project
* **Recognition**: Your domain will appear in public usage statistics
* **Credibility**: Demonstrates real-world usage of your application
* **Community**: Contributes to the broader LLM Gateway ecosystem
### For the Community
* **Transparency**: Shows real adoption and usage patterns
* **Inspiration**: Other developers can see successful implementations
* **Growth**: Helps demonstrate the value of open-source LLM infrastructure
## Optional but Recommended
While the `X-Source` header is optional, we strongly encourage its use to:
* Support transparency in the LLM Gateway ecosystem
* Help showcase successful integrations
* Contribute to understanding of LLM usage patterns
* Demonstrate the real-world impact of your application
Your participation helps build a more transparent and collaborative LLM ecosystem.
# Vision Support
URL: /features/vision
import { Callout } from "fumadocs-ui/components/callout";
# Vision Support
LLMGateway supports vision-enabled models that can analyze and describe images. You can provide images via HTTPS URLs or inline base64-encoded data.
## Vision-Enabled Models
You can find all vision-enabled models on our [models page with vision filter](https://llmgateway.io/models?filters=1\&vision=true). These models can process both text and image content in the same request.
## Image Formats
### Using HTTPS URLs
You can provide any publicly accessible HTTPS URL pointing to an image:
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What do you see in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
}'
```
### Using Base64 Inline Data
You can also provide images as base64-encoded data URIs:
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image"
},
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEASABIAAD..."
}
}
]
}
]
}'
```
## Content Array Format
When using vision models, the `content` field should be an array containing both text and image content blocks:
* **Text content**: `{"type": "text", "text": "Your message"}`
* **Image content**: `{"type": "image_url", "image_url": {"url": "image_url_or_data_uri"}}`
## Multiple Images
You can include multiple images in a single request:
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Compare these two images"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image1.jpg"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image2.jpg"
}
}
]
}
]
}'
```
## Simple String Content
For vision models, you can still use simple string content for text-only
messages. The array format is only required when including images.
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Hello! How can you help me today?"
}
]
}'
```
## Supported Image Types
Vision models typically support common image formats including:
* JPEG (.jpg, .jpeg)
* PNG (.png)
* WebP (.webp)
* GIF (.gif)
The specific formats supported may vary by model provider. Check the individual model documentation for format limitations and file size restrictions.
## Error Handling
If an image URL is inaccessible or the image format is unsupported, the gateway will handle the error gracefully and may substitute a placeholder or error message in the request to the underlying model.
# Native Web Search
URL: /features/web-search
import { Callout } from "fumadocs-ui/components/callout";
# Native Web Search
LLM Gateway supports native web search capabilities that allow models to access real-time information from the internet. This feature is useful for answering questions about current events, recent news, live data, and other time-sensitive information that may not be in the model's training data.
## How It Works
When you include the `web_search` tool in your request, the model can search the web to gather relevant information before generating a response:
1. You send a request with the `web_search` tool enabled
2. The model determines if web search is needed based on the query
3. If needed, the model performs web searches to gather current information
4. The model synthesizes the search results and generates a response
5. Citations are included in the response to show information sources
## Supported Providers
Native web search is available on select models. See all models with native web search support on our [models page](https://llmgateway.io/models?filters=1\&webSearch=true).
## Basic Usage
To enable web search, add the `web_search` tool to your request:
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.2",
"messages": [
{
"role": "user",
"content": "What is the current weather in San Francisco?"
}
],
"tools": [
{
"type": "web_search"
}
]
}'
```
### Example Response
```json
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "openai/gpt-5.2",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The current weather in San Francisco is 57Β°F (14Β°C) with mostly cloudy skies...",
"annotations": [
{
"type": "url_citation",
"url": "https://weather.com/...",
"title": "San Francisco Weather"
}
]
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 150,
"total_tokens": 165,
"cost_usd_total": 0.0315
}
}
```
## Web Search Options
The `web_search` tool accepts optional configuration parameters:
### User Location
Provide location context to get more relevant local search results:
```json
{
"type": "web_search",
"user_location": {
"city": "San Francisco",
"region": "California",
"country": "US",
"timezone": "America/Los_Angeles"
}
}
```
### Search Context Size
Control the amount of web content retrieved (OpenAI only):
```json
{
"type": "web_search",
"search_context_size": "medium"
}
```
Available values:
* `low` - Minimal search context, faster responses
* `medium` - Balanced context (default)
* `high` - Maximum search context, more comprehensive
### Max Uses
Limit the number of searches per request (provider-dependent):
```json
{
"type": "web_search",
"max_uses": 3
}
```
## Using with SDKs
### OpenAI SDK (Python)
```python
from openai import OpenAI
client = OpenAI(
base_url="https://api.llmgateway.io/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="gpt-5.2",
messages=[
{"role": "user", "content": "What are the latest news headlines today?"}
],
tools=[{"type": "web_search"}]
)
print(response.choices[0].message.content)
```
### OpenAI SDK (TypeScript)
```typescript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.llmgateway.io/v1",
apiKey: "your-api-key",
});
const response = await client.chat.completions.create({
model: "gpt-5.2",
messages: [{ role: "user", content: "What are the latest tech news?" }],
tools: [{ type: "web_search" }],
});
console.log(response.choices[0].message.content);
```
## Streaming
Web search works with streaming responses. Citations are included in the final chunks:
```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.2",
"messages": [
{"role": "user", "content": "What is the current stock price of Apple?"}
],
"tools": [{"type": "web_search"}],
"stream": true
}'
```
## Citations and Sources
Web search responses include citations to show where information was sourced from. These appear in the `annotations` field of the message:
```json
{
"annotations": [
{
"type": "url_citation",
"url": "https://example.com/article",
"title": "Article Title",
"start_index": 0,
"end_index": 50
}
]
}
```
Citation format may vary slightly between providers, but LLM Gateway
normalizes them into a consistent structure.
## Cost Tracking
Web search costs are tracked separately from token costs in the usage object:
```json
{
"usage": {
"prompt_tokens": 15,
"completion_tokens": 150,
"total_tokens": 165,
"cost_usd_total": 0.0125,
"cost_usd_input": 0.0015,
"cost_usd_output": 0.01,
"cost_usd_web_search": 0.01
}
}
```
The `cost_usd_web_search` field shows the cost incurred specifically for web search queries. Web search is billed at $0.01 per search call for reasoning models (GPT-5, o-series) and $0.025 per call for non-reasoning models.
## Combining with Function Tools
You can use web search alongside regular function tools:
```json
{
"tools": [
{ "type": "web_search" },
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" }
}
}
}
}
]
}
```
Some dedicated search models only support web search and do not support
additional function tools. Use `gpt-5.2` or other GPT-5 series models if you
need both web search and function tools.
## Use Cases
### Current Events and News
```json
{
"messages": [
{ "role": "user", "content": "What are the major news stories today?" }
],
"tools": [{ "type": "web_search" }]
}
```
### Real-Time Data
```json
{
"messages": [
{ "role": "user", "content": "What is the current price of Bitcoin?" }
],
"tools": [{ "type": "web_search" }]
}
```
### Research and Fact-Checking
```json
{
"messages": [
{
"role": "user",
"content": "What are the latest findings on climate change?"
}
],
"tools": [{ "type": "web_search" }]
}
```
### Local Information
```json
{
"messages": [
{
"role": "user",
"content": "What restaurants are open near me right now?"
}
],
"tools": [
{
"type": "web_search",
"user_location": {
"city": "New York",
"country": "US"
}
}
]
}
```
## Best Practices
1. **Use GPT-5.2**: For the best web search experience with full tool support, use `gpt-5.2`
2. **Provide location context**: When queries are location-dependent, include `user_location` for more relevant results
3. **Monitor costs**: Web search incurs per-query costs in addition to token costs
4. **Check citations**: Always review the citations in responses to verify information sources
5. **Use streaming**: For user-facing applications, enable streaming to show responses as they're generated
## Error Handling
If you try to use web search with a model that doesn't support it:
```json
{
"error": {
"message": "Model gpt-4o does not support native web search. Remove the web_search tool or use a model that supports it. See https://llmgateway.io/models?features=webSearch for supported models.",
"type": "invalid_request_error"
}
}
```
To avoid this error, only use the `web_search` tool with [native web search enabled models](https://llmgateway.io/models?filters=1\&webSearch=true).
# AWS Bedrock Integration
URL: /integrations/aws-bedrock
import { Step, Steps } from "fumadocs-ui/components/steps";
AWS Bedrock is Amazon's fully managed service that provides access to foundation models from leading AI companies. This guide shows how to create AWS Bedrock Long-Term API Keys and integrate them with LLM Gateway.
## Prerequisites
* An AWS account with Bedrock access enabled
* LLM Gateway account or self-hosted instance
## Overview
AWS Bedrock supports **Long-Term API Keys** for simplified authentication. These keys provide direct API access without requiring IAM credentials or complex authentication flows.
## Create AWS Bedrock Long-Term API Key
### Enable Model Access in Bedrock
1. Log into the **AWS Console**
2. Navigate to **AWS Bedrock** service
3. Go to **Model access** in the left sidebar
4. Click **Manage model access**
5. Enable the models you want to use (e.g., Claude 3.5, Llama 3)
6. Wait for access to be granted (usually instant for most models)
### Create Long-Term API Key
1. In AWS Bedrock console, navigate to **API Keys** in the left sidebar
2. Click **Create Long-Term API Key**
3. Set expiry date ("Never expires" is recommended)
4. Click **Generate**
5. **Important**: Copy the API key immediately - it's only shown once!
## Add to LLM Gateway
### Navigate to Provider Keys
1. Log into [LLM Gateway Dashboard](https://llmgateway.io/dashboard)
2. Select your organization and project
3. Go to **Provider Keys** in the sidebar
### Add AWS Bedrock Provider Key
1. Click **Add** for **AWS Bedrock**
2. Paste your Long-Term API Key
3. **Select Region Prefix** based on where you want to use your models:
* **us.** - For US regions (`us-east-1`, `us-west-2`)
* **eu.** - For European regions (`eu-central-1`, `eu-west-1`)
* **global.** - For global/cross-region endpoints
4. Click **Add Key**
The system will validate your key and confirm the connection.
### Test the Integration
Test your integration with a simple API call:
```bash
curl -X POST https://api.llmgateway.io/v1/chat/completions \
-H "Authorization: Bearer YOUR_LLMGATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "aws-bedrock/claude-3-5-sonnet",
"messages": [
{
"role": "user",
"content": "Hello from AWS Bedrock!"
}
]
}'
```
Replace `YOUR_LLMGATEWAY_API_KEY` with your LLM Gateway API key.
## Available Models
Once configured, you can access all AWS Bedrock models through LLM Gateway:
* **Anthropic Claude**: `aws-bedrock/claude-3-5-sonnet`, `aws-bedrock/claude-3-5-haiku`
* **Meta Llama**: `aws-bedrock/llama-3-2-90b`, `aws-bedrock/llama-3-2-11b`
* **Amazon Titan**: `aws-bedrock/amazon.titan-text-express-v1`
* **And more...**
Browse all available models at [llmgateway.io/models](https://llmgateway.io/models?provider=aws-bedrock)
## Troubleshooting
### "Model not available" error
* Verify you've enabled model access in AWS Bedrock console
* Check that the region where you created your key has access to the model
* Some models are only available in specific regions
### Rate limiting
* AWS Bedrock has request quotas per model and region
* Monitor usage in AWS Bedrock console
* Consider requesting quota increases for high-volume workloads
# Azure Integration
URL: /integrations/azure
import { Step, Steps } from "fumadocs-ui/components/steps";
Azure provides access to OpenAI's powerful language models through Microsoft's enterprise cloud infrastructure. This guide shows how to create an Azure resource, deploy models, and integrate them with LLM Gateway.
Only OpenAI models are supported via Azure at this time. [Open an
issue](https://github.com/theopenco/llmgateway/issues/new) to request support
for other model types.
## Prerequisites
* An Azure account with an active subscription
* LLM Gateway account or self-hosted instance
## Overview
Azure provides enterprise-grade access to OpenAI models with enhanced security, compliance, and regional availability. LLM Gateway integrates seamlessly with Azure deployments.
## Create Azure Resource
### Create an Azure OpenAI Resource
1. Log into the **Azure Portal** ([https://portal.azure.com](https://portal.azure.com))
2. Click **Create a resource**
3. Search for **Azure OpenAI** and select it
4. Click **Create**
5. Configure the resource:
* **Subscription**: Select your Azure subscription
* **Resource group**: Create new or select existing
* **Region**: Choose a region (e.g., East US, West Europe)
* **Name**: Enter a unique resource name (this will be your ``)
* **Pricing tier**: Select Standard S0
6. Click **Review + create**, then **Create**
7. Wait for deployment to complete
**Important**: Note your resource name - it will be used in the base URL: `https://.openai.azure.com`
### Deploy Models
1. Navigate to your Azure resource in the Azure Portal
2. Click **Go to Azure OpenAI Studio** or visit [https://oai.azure.com](https://oai.azure.com)
3. In Azure Studio, select **Deployments** from the left sidebar
4. Click **Create new deployment**
5. Configure your deployment:
* **Model**: Select a model (e.g., gpt-4o, gpt-4o-mini, gpt-4-turbo)
* **Deployment name**: Enter a name (this must match the model identifier you'll use β use the pre-filled name)
* **Model version**: Select the latest version
* **Deployment type**: Global Standard
6. Click **Create**
7. Repeat for additional models you want to use
**Note**: The deployment name must match the expected model name:
* For `gpt-4o-mini` β deployment name should be `gpt-4o-mini`
* For `gpt-35-turbo` β deployment name should be `gpt-35-turbo`
etc.
### Get API Key
1. In the Azure Portal, go to your Azure resource
2. Click **Keys and Endpoint** in the left sidebar
3. Copy **Key 1** or **Key 2**
4. Note your **Endpoint** URL (should be `https://.openai.azure.com`)
**Important**: Keep your API key secure - it provides access to your Azure deployments.
## Add to LLM Gateway
### Navigate to Provider Keys
1. Log into [LLM Gateway Dashboard](https://llmgateway.io/dashboard)
2. Select your organization and project
3. Go to **Provider Keys** in the sidebar
### Add Azure Provider Key
1. Click **Add** for **Azure**
2. Enter your **API Key** from Azure Portal
3. Enter your **Resource Name** (the name from your Azure endpoint URL)
* Example: If your endpoint is `https://my-openai-resource.openai.azure.com`, enter `my-openai-resource`
4. Select your preferred **type** (Azure OpenAI or AI Foundry)
5. Adapt the **Validation Model** to a model that you already deployed and is available
This is a one time check to ensure the API key is valid and the model can be accessed.
6. Click **Add Key**
The system will validate your key and confirm the connection.
### Test the Integration
Test your integration with a simple API call:
```bash
curl -X POST https://api.llmgateway.io/v1/chat/completions \
-H "Authorization: Bearer YOUR_LLMGATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "azure/gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "Hello from Azure!"
}
]
}'
```
Replace `YOUR_LLMGATEWAY_API_KEY` with your LLM Gateway API key.
## Available Models
Once configured, you can access your Azure deployments through LLM Gateway:
* **GPT-4o**: `azure/gpt-4o`
* **GPT-4o Mini**: `azure/gpt-4o-mini`
* **GPT-3.5 Turbo**: `azure/gpt-3.5-turbo` (note: use gpt-3.5-turbo as llmgateway model name instead of gpt-35-turbo)
**Note**: Only models you have deployed in Azure Studio will be available. Ensure your deployment names match the expected model identifiers.
Browse all available models at [llmgateway.io/models](https://llmgateway.io/models?provider=azure)
## Troubleshooting
### "Deployment not found" error
* Verify you've created a deployment in Azure Studio
* Ensure the deployment name exactly matches the model name you're requesting
* Check that the deployment is in the same resource as your API key
### "Resource not found" error
* Verify the resource name is correct (check your Azure Portal endpoint URL)
* Ensure your API key belongs to the correct Azure resource
* Confirm the resource is in an active state in Azure Portal
### Rate limiting
* Azure has Tokens Per Minute (TPM) quotas per deployment
* Monitor usage in Azure Studio under **Quotas**
* Request quota increases through Azure Portal if needed for high-volume workloads
### Region availability
* Not all models are available in all Azure regions
* Check [Azure model availability](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#model-summary-table-and-region-availability) for your region
* Consider creating resources in multiple regions for better availability
# Activity
URL: /learn/activity
import { ThemedImage } from "@/components/themed-image";
The Activity page shows a real-time log of every API request routed through LLM Gateway. Use it to debug requests, monitor performance, and track costs per call.
## Filters
Filter the activity log using the controls at the top:
| Filter | Description |
| --------------------------- | ------------------------------------------------------- |
| **Time range** | Filter by a specific time period |
| **Unified reasons** | Filter by completion reason (e.g., stop, length, error) |
| **Providers** | Show requests for specific providers only |
| **Models** | Show requests for specific models only |
| **Custom header key/value** | Filter by custom metadata headers attached to requests |
## Activity List
Each activity entry shows:
* **Status icon** β Green checkmark for completed, red circle for errors
* **Response preview** β First line of the model's response (when available)
* **Model** β The provider and model used (e.g., `google-vertex/gemini-3-pro-image-preview`)
* **Cache status** β Whether the response was served from cache
* **Tokens** β Total tokens consumed (input + output)
* **Duration** β How long the request took
* **Cost** β Inference cost for the request
* **Source** β Where the request originated from
* **Discount** β Any discount applied (e.g., "20% off")
* **Status badge** β `completed`, `upstream_error`, `gateway_error`, etc.
* **Timestamp** β Relative time (e.g., "about 4 hours ago")
### Actions per Entry
* **Open in new tab** β View the full request detail in a new browser tab
* **Expand** β Expand inline to see more details
## Activity Detail
Click on any activity entry to view its full detail page.
### Summary Cards
Five cards at the top provide a quick overview:
| Card | Description |
| ------------------ | ------------------------------- |
| **Duration** | Total request time in seconds |
| **Tokens** | Total tokens consumed |
| **Throughput** | Tokens per second |
| **Inference Cost** | Cost charged for this request |
| **Cache** | Whether the response was cached |
### Request Section
Details about the original request:
* **Requested Model** β The model ID sent in the API call
* **Used Model** β The actual model that served the request
* **Model Mapping** β The underlying model identifier
* **Provider** β The provider that handled the request
* **Requested Provider** β The provider specified in the request
* **Streamed** β Whether the response was streamed
* **Canceled** β Whether the request was canceled
* **Source** β The application or service that made the request
### Tokens Section
A detailed token breakdown:
* Prompt Tokens, Completion Tokens, Total Tokens
* Reasoning Tokens (for reasoning models)
* Image Input/Output Tokens (for vision/image models)
* Response Size
### Routing Section
How LLM Gateway routed the request:
* **Selection** β The routing strategy used (e.g., `direct-provider-specified`)
* **Available** β Providers that were available for this model
* **Provider Scores** β Scoring breakdown showing availability, uptime, and latency for each provider
### Parameters Section
The model parameters sent with the request:
* Temperature, Max Tokens, Top P
* Frequency Penalty, Reasoning Effort
* Response Format
# API Keys
URL: /learn/api-keys
import { ThemedImage } from "@/components/themed-image";
The API Keys page lets you create, view, and manage the API keys used to authenticate requests to LLM Gateway.
## Creating an API Key
Click the **Create API Key** button to generate a new key. The number of keys you can create depends on your plan:
* **Free** β Limited number of keys
* **Pro** β Higher key limit
* **Enterprise** β Custom limits
When creating a key, you can assign it a name to help identify its purpose (e.g., "Production", "Development", "CI/CD").
## API Keys List
Each key in the list shows:
| Field | Description |
| ------------- | -------------------------------------------------------------- |
| **Name** | The label you assigned to the key |
| **Key** | A masked preview of the key (only last few characters visible) |
| **Created** | When the key was created |
| **Last used** | When the key was last used in a request |
## Actions
For each API key you can:
* **View** β See the full key (only available once after creation)
* **Edit** β Update the key name
* **Rotate** β Generate a new key value while keeping the same configuration
* **Delete** β Permanently remove the key
## Plan Limits
The page shows your current key count vs. the maximum allowed by your plan. If you've reached your limit, the Create button will be disabled and you'll need to upgrade your plan or delete unused keys.
# Audit Logs
URL: /learn/audit-logs
import { Callout } from "fumadocs-ui/components/callout";
import { ThemedImage } from "@/components/themed-image";
The Audit Logs page provides a complete history of all actions performed within your organization, essential for compliance and security monitoring.
Audit Logs are available on the [**Enterprise
plan**](https://llmgateway.io/enterprise). Owner or Admin role is required.
## Filters
Narrow down the log entries:
* **Action** β Filter by action type (create, delete, update, etc.)
* **Resource type** β Filter by resource (API, IAM, API Keys, etc.)
Both filters are populated dynamically based on the actions recorded in your organization.
## Audit Log Entries
Each log entry shows:
| Field | Description |
| ----------------- | ------------------------------------------------------------ |
| **Timestamp** | Exact time of the action (formatted as MMM d, yyyy HH:mm:ss) |
| **User** | Name and email of the person who performed the action |
| **Action** | What was done (e.g., "API Keys β create") |
| **Resource type** | The type of resource affected (shown as a badge) |
| **Resource ID** | Identifier of the affected resource (with copy button) |
| **Details** | Additional metadata about the action |
## Pagination
The log supports infinite scrolling with a **Load More** button to view older entries. Entries are sorted newest first.
# Billing
URL: /learn/billing
import { ThemedImage } from "@/components/themed-image";
The Billing page is your central hub for managing credits, plans, and payment methods.
## Credits
Displays your current credit balance. Credits are consumed as you make API requests through the gateway. Click **Top Up Credits** to add more credits to your account.
## Plan Management
View and manage your subscription:
* See your current plan (Free, Pro, or Enterprise)
* Billing cycle information
* Click **Manage Subscription** to upgrade, downgrade, or cancel
## Payment Methods
Manage your saved payment methods:
* Add a new credit card or payment method
* View existing payment methods
* Update billing information
## Auto Top-up Settings
Configure automatic credit top-ups so you never run out:
* **Enable/disable** auto top-up
* **Threshold** β The credit balance that triggers a top-up
* **Amount** β How many credits to add when the threshold is reached
This ensures uninterrupted service by automatically replenishing your credits when they run low.
# Dashboard
URL: /learn/dashboard
import { ThemedImage } from "@/components/themed-image";
The Dashboard is the first page you see after logging in. It provides a high-level overview of your project's LLM usage, costs, and performance at a glance.
## Date Range
At the top of the page, you can toggle the date range for all dashboard metrics:
* **7 days** β Last 7 days of data (default)
* **30 days** β Last 30 days of data
* **Custom** β Pick a custom start and end date
## Stat Cards
The dashboard displays eight metric cards in two rows:
### Top Row
| Card | Description |
| ------------------------ | ------------------------------------------------------------------------ |
| **Organization Credits** | Your current available credit balance |
| **Total Requests** | Number of API requests in the selected period, with cache hit percentage |
| **Total Cost** | Total inference cost for the period, including storage costs |
| **Total Savings** | Savings from discounts during the selected period |
### Bottom Row
| Card | Description |
| ------------------------ | ------------------------------------------------------------------- |
| **Input Tokens & Cost** | Total prompt tokens sent and their associated cost |
| **Output Tokens & Cost** | Total completion tokens received and their associated cost |
| **Cached Tokens & Cost** | Tokens served from cache (if caching is enabled) and the cost saved |
| **Most Used Model** | The model with the highest request count, along with its provider |
## Usage Overview Chart
Below the stat cards, a chart visualizes your usage over time. You can toggle between two views using the dropdown:
* **Costs** β Shows input, output, and cached input costs as a stacked area chart
* **Requests** β Shows request volume over time
The chart is filtered by the currently selected project.
## Quick Actions
A sidebar panel provides shortcuts to common tasks:
* **Manage API Keys** β Go to the API Keys page
* **Provider Keys** β Configure your own provider keys
* **View Activity** β See detailed request logs
* **Usage & Metrics** β Dive into usage analytics
* **Model Usage** β View per-model usage breakdown
## Header Actions
Two buttons in the top-right corner:
* **Create API Key** β Quickly create a new API key for your project
* **Top Up Credits** β Add credits to your organization balance
# Guardrails
URL: /learn/guardrails
import { Callout } from "fumadocs-ui/components/callout";
import { ThemedImage } from "@/components/themed-image";
The Guardrails page lets you configure content safety rules that automatically scan and filter API requests before they reach the LLM provider.
Guardrails are available on the [**Enterprise
plan**](https://llmgateway.io/enterprise). Owner or Admin role is required.
## Main Toggle
A global toggle at the top enables or disables all guardrails for your organization. Click **Save Changes** to apply.
## System Rules
Six built-in rules with individual enable/disable toggles:
| Rule | Description |
| ------------------------------- | -------------------------------------------------------------------- |
| **Prompt Injection Detection** | Detects attempts to override or manipulate system instructions |
| **Jailbreak Prevention** | Identifies attempts to bypass safety measures |
| **PII Detection** | Identifies personal information like emails, phone numbers, and SSNs |
| **Secrets Detection** | Detects API keys, passwords, and credentials |
| **File Type Restrictions** | Controls which file types can be uploaded |
| **Document Leakage Prevention** | Detects attempts to extract confidential documents |
Each rule has an action dropdown to configure the response:
* **Block** β Reject the request entirely
* **Redact** β Remove or mask sensitive content, then continue
* **Warn** β Log the violation but allow the request
## File Restrictions
Configure file upload limits:
* **Max file size** β Set the maximum file size in MB
* **Allowed file types** β Add or remove permitted MIME types
## Custom Rules
Create organization-specific rules by clicking **Add Rule**:
* **Blocked Terms** β Block specific words or phrases
* **Custom Regex** β Match patterns with regular expressions
* **Topic Restriction** β Restrict content related to specific topics
Each custom rule can be individually enabled/disabled or deleted.
Learn more about guardrails in the [Guardrails feature docs](/features/guardrails).
# Introduction
URL: /learn
The LLM Gateway dashboard gives you full control over your LLM API usage, costs, and configuration. This section walks you through every page in the dashboard so you can get the most out of the platform.
## Project Pages
These pages are scoped to a specific project within your organization:
* [**Dashboard**](/learn/dashboard) β Overview of your usage, costs, and quick actions
* [**Activity**](/learn/activity) β Detailed logs of every API request
* [**Model Usage**](/learn/model-usage) β Usage breakdown by model
* [**Usage & Metrics**](/learn/usage-metrics) β Requests, errors, cache rates, and cost trends
* [**API Keys**](/learn/api-keys) β Create and manage your API keys
* [**Preferences**](/learn/preferences) β Project-level settings like caching and mode
## Organization Pages
These pages apply to your entire organization:
* [**Provider Keys**](/learn/provider-keys) β Bring your own provider API keys
* [**Guardrails**](/learn/guardrails) β Content safety rules and filters
* [**Security Events**](/learn/security-events) β Monitor guardrail violations
* [**Billing**](/learn/billing) β Credits, plans, and payment methods
* [**Transactions**](/learn/transactions) β Payment and credit history
* [**Referrals**](/learn/referrals) β Earn credits by referring others
* [**Policies**](/learn/policies) β Data retention configuration
* [**Org Preferences**](/learn/org-preferences) β Organization name and billing email
* [**Team**](/learn/team) β Manage team members and roles
* [**Audit Logs**](/learn/audit-logs) β Complete history of organization actions
## Playground
Interactive tools for testing and experimenting with LLM models:
* [**Chat Playground**](/learn/playground) β Test models with an interactive chat interface
* [**Group Chat**](/learn/playground-group) β Compare responses from multiple models side by side
* [**Image Studio**](/learn/playground-image) β Generate and edit images using AI models
# Model Usage
URL: /learn/model-usage
import { ThemedImage } from "@/components/themed-image";
The Model Usage page shows how your API requests are distributed across different LLM models over time.
## Filters
Two filters let you narrow down the data:
* **API Key** β Select a specific API key or view usage across all keys
* **Date range** β Choose a time period to analyze
## Usage Chart
The main chart displays a time-series breakdown of requests per model. Each model is represented by a different color, making it easy to see:
* Which models are used most frequently
* How usage patterns change over time
* Whether usage is concentrated on a single model or spread across many
This page is useful for understanding your model distribution and identifying opportunities to optimize costs by switching to more cost-effective models for certain workloads.
# Org Preferences
URL: /learn/org-preferences
import { ThemedImage } from "@/components/themed-image";
The Org Preferences page contains basic settings for your organization.
## Organization Name
Update your organization's display name. This name appears throughout the dashboard and in billing communications.
## Billing Email
Set or update the email address used for billing-related communications, including receipts, invoices, and payment notifications.
# Group Chat
URL: /learn/playground-group
import { ThemedImage } from "@/components/themed-image";
The Group Chat page lets you send a single prompt to multiple models simultaneously and compare their responses side by side. This is useful for evaluating model quality, speed, and cost.
## How It Works
1. Select two or more models from the model picker
2. Type your prompt in the input field
3. All selected models receive the same prompt at once
4. Responses stream in parallel, displayed in separate columns
## Use Cases
* **Model evaluation** β Compare output quality across providers
* **Cost optimization** β See which models give the best results for the price
* **Speed comparison** β Observe latency differences between models
* **Migration testing** β Verify that a new model produces equivalent results
# Image Studio
URL: /learn/playground-image
import { ThemedImage } from "@/components/themed-image";
The Image Studio lets you generate images using AI models through an intuitive interface. Select a model, describe what you want, and get results instantly.
## Model Selection
Choose from supported image generation models in the dropdown. Each model has different capabilities, resolutions, and pricing.
## Generating Images
1. Select an image generation model
2. Type a description of the image you want
3. Click send to generate
4. Generated images appear in the conversation
## Image Count
You can generate 1, 2, or 4 images at once. Multiple images are displayed in a grid layout.
## Resolution Options
Available resolutions depend on the selected model. Common options include 1K, 2K, and 4K.
# Chat Playground
URL: /learn/playground
import { ThemedImage } from "@/components/themed-image";
The Chat Playground is a standalone app for testing LLM models through a conversational interface. You can select any supported model, adjust parameters, and see responses in real time.
## Model Selection
Use the dropdown at the top to pick a model and provider. The **Auto Route** option automatically selects the best provider based on availability and cost.
## Chat Interface
* Type your message in the input field at the bottom
* Click the send button or press Enter to submit
* Responses stream in real time
* Previous conversations appear in the sidebar
## Prompt Suggestions
When starting a new chat, category tabs help you pick a prompt:
* **Create** β Content generation prompts
* **Explore** β Research and analysis prompts
* **Code** β Programming and development prompts
* **Image gen** β Image generation prompts
## Sidebar
The left sidebar shows your chat history. Click **+ New Chat** to start a fresh conversation, or select a previous chat to continue it.
## Comparison Mode
Toggle **Comparison mode** in the top-right to send the same prompt to multiple models side by side. See the [Group Chat](/learn/playground-group) page for details.
## Image Studio
Click **Image Studio** in the sidebar to switch to the image generation interface. See the [Image Studio](/learn/playground-image) page for details.
# Policies
URL: /learn/policies
import { ThemedImage } from "@/components/themed-image";
The Policies page lets you configure organization-wide policies that govern how your data is handled.
## Data Retention
Control how long your request logs and activity data are stored. The retention period depends on your plan:
| Plan | Retention Period |
| -------------- | ---------------- |
| **Free** | 3 days |
| **Pro** | 7 days |
| **Enterprise** | 90 days |
After the retention period expires, request logs and associated data are automatically deleted.
Learn more about data retention in the [Data Retention feature docs](/features/data-retention).
# Preferences
URL: /learn/preferences
import { ThemedImage } from "@/components/themed-image";
The Preferences page contains project-level settings that control how your project behaves.
## Project Name
Update the display name for your project. This name appears in the sidebar and throughout the dashboard.
## Project Mode
Configure how your organization handles projects. This setting determines the routing and isolation behavior for API requests within the project.
## Caching
Enable or configure response caching for API requests. When enabled, identical requests will return cached responses instead of making new calls to the provider, saving both time and cost.
Learn more about caching in the [Caching feature docs](/features/caching).
## Danger Zone
The Danger Zone section contains irreversible actions:
* **Archive Project** β Permanently archive the project. This action cannot be undone. Archived projects stop processing requests and their API keys become inactive.
# Provider Keys
URL: /learn/provider-keys
import { Callout } from "fumadocs-ui/components/callout";
import { ThemedImage } from "@/components/themed-image";
The Provider Keys page lets you add your own API keys from LLM providers (OpenAI, Anthropic, Google, etc.) to route requests directly through your accounts without additional gateway fees.
## Adding a Provider Key
Click **Add Provider Key** to configure a new key:
* **Provider** β Select which provider this key belongs to
* **Custom name** β An optional label to identify the key
* **API key** β Your provider's API key
* **Base URL** β Optional custom endpoint (useful for Azure OpenAI or custom deployments)
## Provider Keys List
Each configured key shows:
| Field | Description |
| --------------- | -------------------------------------------------- |
| **Provider** | The LLM provider (e.g., OpenAI, Anthropic) |
| **Custom name** | Your label for the key |
| **Status** | Active, inactive, or deleted |
| **Base URL** | Custom endpoint if configured |
| **Token** | Masked key with only the last 4 characters visible |
## Actions
For each provider key:
* **Edit** β Update the key name, value, or base URL
* **Deactivate** β Temporarily disable the key without deleting it
* **Delete** β Permanently remove the key
When you use your own provider keys, requests are routed directly to the
provider. You are only charged the provider's standard rates with no
additional gateway markup.
# Referrals
URL: /learn/referrals
import { ThemedImage } from "@/components/themed-image";
The Referrals page lets you earn credits by inviting others to use LLM Gateway.
## Eligibility
To unlock the referral program, your organization must have at least **$100 in total credit top-ups**. Before reaching this threshold, the page shows:
* A progress bar showing your progress toward $100
* The remaining amount needed to unlock
* An explanation of the 1% earnings model
## Referral Dashboard
Once eligible, the page shows:
### Your Referral Link
A unique shareable link tied to your organization. Click the copy button to copy it to your clipboard and share it with others.
### Your Stats
| Stat | Description |
| ------------------ | ----------------------------------------------------- |
| **Users Referred** | Total number of users who signed up through your link |
| **Total Earnings** | Total credit amount earned from referrals |
### How It Works
1. **Share Your Link** β Send your referral link to others
2. **They Sign Up** β They create an LLM Gateway account using your link
3. **Earn Credits** β You earn 1% of their spending as credits
Credits are automatically added to your organization balance.
# Security Events
URL: /learn/security-events
import { Callout } from "fumadocs-ui/components/callout";
import { ThemedImage } from "@/components/themed-image";
The Security Events page shows all guardrail violations detected across your organization, helping you monitor content safety and policy enforcement.
Security Events are available on the [**Enterprise
plan**](https://llmgateway.io/enterprise). Owner or Admin role is required.
## Stats Cards
Four summary cards at the top:
| Card | Description |
| -------------------- | --------------------------------------------- |
| **Total Violations** | All-time violation count |
| **Last 24 Hours** | Violations in the past day |
| **Blocked** | Number of requests that were blocked |
| **Redacted** | Number of requests where content was redacted |
## Filters
Narrow down the events list:
* **Action** β Filter by Blocked, Redacted, Warned, or All actions
* **Category** β Filter by Prompt Injection, Jailbreak, PII Detection, Secrets, Blocked Terms, Custom Regex, or Topic Restriction
## Violations List
Each violation entry shows:
| Field | Description |
| ------------------- | ---------------------------------------------------- |
| **Timestamp** | When the violation occurred |
| **Rule name** | Which guardrail rule was triggered |
| **Category** | The type of violation (shown as a badge) |
| **Action** | What action was taken (Blocked, Redacted, or Warned) |
| **Matched pattern** | The content that triggered the rule |
The list supports pagination with a **Load More** button for viewing older events.
# Team
URL: /learn/team
import { ThemedImage } from "@/components/themed-image";
The Team page lets you invite team members, assign roles, and control access to your organization.
## Adding Members
Click **Add Member** to invite someone by email. You'll need to:
1. Enter their email address
2. Select a role (Developer, Admin, or Owner)
Your plan includes up to **5 team seats**. The current count is displayed, and the Add button is disabled when all seats are used. Contact sales for additional seats.
## Team Members List
Each member shows:
| Field | Description |
| --------- | ------------------------------------------------ |
| **Name** | The member's display name |
| **Email** | Their email address |
| **Role** | Their current role (can be changed via dropdown) |
## Actions
* **Update role** β Change a member's role using the dropdown
* **Remove** β Remove a member from the organization (requires confirmation)
## Role Permissions
| Role | Permissions |
| ------------- | ----------------------------------------------------------------------------------------------------- |
| **Owner** | Full access to all settings, billing, team management, and all projects |
| **Admin** | Can manage team members, projects, and API keys, but cannot access billing or delete the organization |
| **Developer** | View and use resources only. Cannot modify settings or manage team |
Developers can also be given **restricted access** at the API key level, limiting which keys they can view and use.
# Transactions
URL: /learn/transactions
import { ThemedImage } from "@/components/themed-image";
The Transactions page shows a complete history of all financial transactions in your organization.
## Transaction History
Each transaction entry includes:
| Field | Description |
| --------------- | ---------------------------------------- |
| **Date** | When the transaction occurred |
| **Type** | The transaction type (see below) |
| **Credits** | Number of credits added or deducted |
| **Total Paid** | The dollar amount charged |
| **Status** | Current state of the transaction |
| **Description** | Additional details about the transaction |
## Transaction Types
| Type | Description |
| ----------------------- | ----------------------------------- |
| **Credit Top-up** | Manual or automatic credit purchase |
| **Credit Refund** | Credits refunded to your account |
| **Subscription Start** | New plan subscription started |
| **Subscription Cancel** | Plan subscription canceled |
| **Subscription End** | Plan subscription period ended |
## Status Badges
* **Completed** β Transaction processed successfully
* **Pending** β Transaction is being processed
* **Failed** β Transaction could not be completed
# Usage & Metrics
URL: /learn/usage-metrics
import { ThemedImage } from "@/components/themed-image";
The Usage & Metrics page provides comprehensive analytics through five tabs, giving you deep insight into your LLM API usage patterns.
## Filters
* **API Key** β Filter metrics by a specific API key or view all
* **Date range** β Select the time period (defaults to last 7 days)
## Tabs
### Requests
A time-series chart showing request volume over the selected period. Use this to identify traffic patterns, peak usage times, and growth trends.
### Models
A table showing your top-used models ranked by request count. For each model you can see:
* Total requests
* Token consumption
* Associated costs
This helps you understand which models drive the most usage and cost.
### Errors
A chart showing error rates over time. Track:
* Error frequency and trends
* Spikes that may indicate provider issues
* Overall reliability of your API calls
### Cache
A chart showing your cache hit rate over time. Monitor:
* How effectively caching is reducing redundant requests
* Cache hit vs. miss ratios
* The cost savings from cached responses
### Costs
A cost breakdown chart showing spending patterns. Analyze:
* Cost trends over time
* Cost distribution by provider or model
* Opportunities to reduce spending
# Migrate from LiteLLM
URL: /migrations/litellm
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
Running your own LiteLLM proxy worksβuntil it doesn't. Scaling, monitoring, and keeping it running becomes another job. LLM Gateway gives you the same unified API with built-in analytics, caching, and a dashboardβwithout the infrastructure overhead.
## Quick Migration
Both services use OpenAI-compatible endpoints, so migration is a two-line change:
```diff
- const baseURL = "http://localhost:4000/v1"; // LiteLLM proxy
+ const baseURL = "https://api.llmgateway.io/v1";
- const apiKey = process.env.LITELLM_API_KEY;
+ const apiKey = process.env.LLM_GATEWAY_API_KEY;
```
## Why Teams Switch to LLM Gateway
| What You Get | LiteLLM (Self-Hosted) | LLM Gateway |
| ------------------------ | --------------------- | -------------------- |
| OpenAI-compatible API | Yes | Yes |
| Infrastructure to manage | Yes (you run it) | No (we run it) |
| Managed cloud option | No | Yes |
| Analytics dashboard | Basic | Per-request detail |
| Response caching | Manual setup | Built-in, automatic |
| Cost tracking | Via callbacks | Native, real-time |
| Provider key management | Config file | Web UI with rotation |
| Uptime & scaling | You handle it | 99.9% SLA (Pro/Ent) |
Still want to self-host? LLM Gateway is [open source under AGPLv3](https://llmgateway.io/blog/how-to-self-host-llm-gateway)βsame features, your infrastructure.
For a detailed breakdown, see [LLM Gateway vs LiteLLM](https://llmgateway.io/compare/litellm).
## Migration Steps
### Get Your LLM Gateway API Key
Sign up at [llmgateway.io/signup](https://llmgateway.io/signup) and create an API key from your dashboard.
### Map Your Models
LLM Gateway supports two model ID formats:
**Root Model IDs** (without provider prefix) - Uses smart routing to automatically select the best provider based on uptime, throughput, price, and latency:
```
gpt-5.2
claude-opus-4-5-20251101
gemini-3-flash-preview
```
**Provider-Prefixed Model IDs** - Routes to a specific provider with automatic failover if uptime drops below 90%:
```
openai/gpt-5.2
anthropic/claude-opus-4-5-20251101
google-ai-studio/gemini-3-flash-preview
```
This means many LiteLLM model names work directly with LLM Gateway:
| LiteLLM Model | LLM Gateway Model |
| -------------------------------- | ----------------------------------------------------------------- |
| gpt-5.2 | gpt-5.2 or openai/gpt-5.2 |
| claude-opus-4-5-20251101 | claude-opus-4-5-20251101 or anthropic/claude-opus-4-5-20251101 |
| gemini/gemini-3-flash-preview | gemini-3-flash-preview or google-ai-studio/gemini-3-flash-preview |
| bedrock/claude-opus-4-5-20251101 | claude-opus-4-5-20251101 or aws-bedrock/claude-opus-4-5-20251101 |
For more details on routing behavior, see the [routing documentation](/features/routing).
### Update Your Code
#### Python with OpenAI SDK
```python
from openai import OpenAI
# Before (LiteLLM proxy)
client = OpenAI(
base_url="http://localhost:4000/v1",
api_key=os.environ["LITELLM_API_KEY"]
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)
# After (LLM Gateway) - model name can stay the same!
client = OpenAI(
base_url="https://api.llmgateway.io/v1",
api_key=os.environ["LLM_GATEWAY_API_KEY"]
)
response = client.chat.completions.create(
model="gpt-4", # or "openai/gpt-4" to target a specific provider
messages=[{"role": "user", "content": "Hello!"}]
)
```
#### Python with LiteLLM Library
If you're using the LiteLLM library directly, you can point it to LLM Gateway:
```python
import litellm
# Before (direct LiteLLM)
response = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)
# After (via LLM Gateway) - same model name works
response = litellm.completion(
model="gpt-4", # or "openai/gpt-4" to target a specific provider
messages=[{"role": "user", "content": "Hello!"}],
api_base="https://api.llmgateway.io/v1",
api_key=os.environ["LLM_GATEWAY_API_KEY"]
)
```
#### TypeScript/JavaScript
```typescript
import OpenAI from "openai";
// Before (LiteLLM proxy)
const client = new OpenAI({
baseURL: "http://localhost:4000/v1",
apiKey: process.env.LITELLM_API_KEY,
});
// After (LLM Gateway) - same model name works
const client = new OpenAI({
baseURL: "https://api.llmgateway.io/v1",
apiKey: process.env.LLM_GATEWAY_API_KEY,
});
const completion = await client.chat.completions.create({
model: "gpt-4", // or "openai/gpt-4" to target a specific provider
messages: [{ role: "user", content: "Hello!" }],
});
```
#### cURL
```bash
# Before (LiteLLM proxy)
curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# After (LLM Gateway) - same model name works
curl https://api.llmgateway.io/v1/chat/completions \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Use "openai/gpt-4" to target a specific provider
```
### Migrate Configuration
#### LiteLLM Config (Before)
```yaml
# litellm_config.yaml
model_list:
- model_name: gpt-4
litellm_params:
model: gpt-4
api_key: sk-...
- model_name: claude-3
litellm_params:
model: claude-3-sonnet-20240229
api_key: sk-ant-...
```
#### LLM Gateway (After)
With LLM Gateway, you don't need a config file. Provider keys are managed in the web dashboard, or you can use the default LLM Gateway keys.
If you want to use your own provider keys, configure them in the dashboard under Settings > Provider Keys.
## Streaming Support
LLM Gateway supports streaming identically to LiteLLM:
```python
from openai import OpenAI
client = OpenAI(
base_url="https://api.llmgateway.io/v1",
api_key=os.environ["LLM_GATEWAY_API_KEY"]
)
stream = client.chat.completions.create(
model="openai/gpt-4",
messages=[{"role": "user", "content": "Write a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
```
## Function/Tool Calling
LLM Gateway supports function calling:
```python
from openai import OpenAI
client = OpenAI(
base_url="https://api.llmgateway.io/v1",
api_key=os.environ["LLM_GATEWAY_API_KEY"]
)
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}]
response = client.chat.completions.create(
model="openai/gpt-4",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools
)
```
## Removing LiteLLM Infrastructure
After verifying LLM Gateway works for your use case, you can decommission your LiteLLM proxy:
1. Update all clients to use LLM Gateway endpoints
2. Monitor the LLM Gateway dashboard for successful requests
3. Shut down your LiteLLM proxy server
4. Remove LiteLLM configuration files
## What Changes After Migration
* **No servers to babysit** β We handle scaling, uptime, and updates
* **Real-time cost visibility** β See what every request costs, broken down by model
* **Automatic caching** β Repeated requests hit cache, reducing your spend
* **Web-based management** β No more editing YAML files for config changes
* **New models immediately** β Access new releases within 48 hours, no deployment needed
## Self-Hosting LLM Gateway
If you prefer self-hosting like LiteLLM, LLM Gateway is available under AGPLv3:
```bash
git clone https://github.com/llmgateway/llmgateway
cd llmgateway
pnpm install
pnpm setup
pnpm dev
```
This gives you the same benefits as LiteLLM's self-hosted proxy with LLM Gateway's analytics and caching features.
## Full Comparison
Want to see a detailed breakdown of all features? Check out our [LLM Gateway vs LiteLLM comparison page](https://llmgateway.io/compare/litellm).
## Need Help?
* Browse available models at [llmgateway.io/models](https://llmgateway.io/models)
* Read the [API documentation](https://docs.llmgateway.io)
* Contact support at [contact@llmgateway.io](mailto:contact@llmgateway.io)
# Migrate from OpenRouter
URL: /migrations/openrouter
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
LLM Gateway works just like OpenRouterβsame API format, same model namesβbut with built-in analytics and the option to self-host. Migration takes two lines of code.
## Quick Migration
Change your base URL and API key:
```diff
- const baseURL = "https://openrouter.ai/api/v1";
- const apiKey = process.env.OPENROUTER_API_KEY;
+ const baseURL = "https://api.llmgateway.io/v1";
+ const apiKey = process.env.LLM_GATEWAY_API_KEY;
```
## Migration Steps
### Get Your LLM Gateway API Key
Sign up at [llmgateway.io/signup](https://llmgateway.io/signup) and create an API key from your dashboard.
### Update Environment Variables
```bash
# Remove OpenRouter credentials
# OPENROUTER_API_KEY=sk-or-...
# Add LLM Gateway credentials
LLM_GATEWAY_API_KEY=llmgtwy_your_key_here
```
### Update Your Code
#### Using fetch/axios
```typescript
// Before (OpenRouter)
const response = await fetch("https://openrouter.ai/api/v1/chat/completions", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "openai/gpt-5.2",
messages: [{ role: "user", content: "Hello!" }],
}),
});
// After (LLM Gateway)
const response = await fetch("https://api.llmgateway.io/v1/chat/completions", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gpt-5.2",
messages: [{ role: "user", content: "Hello!" }],
}),
});
```
#### Using OpenAI SDK
```typescript
import OpenAI from "openai";
// Before (OpenRouter)
const client = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
});
// After (LLM Gateway)
const client = new OpenAI({
baseURL: "https://api.llmgateway.io/v1",
apiKey: process.env.LLM_GATEWAY_API_KEY,
});
// Usage remains the same
const completion = await client.chat.completions.create({
model: "anthropic/claude-3-5-sonnet-20241022",
messages: [{ role: "user", content: "Hello!" }],
});
```
#### Using Vercel AI SDK
Both OpenRouter and LLM Gateway have native AI SDK providers, making migration straightforward:
```typescript
import { generateText } from "ai";
// Before (OpenRouter AI SDK Provider)
import { createOpenRouter } from "@openrouter/ai-sdk-provider";
const openrouter = createOpenRouter({
apiKey: process.env.OPENROUTER_API_KEY,
});
const { text } = await generateText({
model: openrouter("gpt-5.2"),
prompt: "Hello!",
});
// After (LLM Gateway AI SDK Provider)
import { createLLMGateway } from "@llmgateway/ai-sdk-provider";
const llmgateway = createLLMGateway({
apiKey: process.env.LLMGATEWAY_API_KEY,
});
const { text } = await generateText({
model: llmgateway("gpt-5.2"),
prompt: "Hello!",
});
```
## Model Name Mapping
Most model names are compatible, but here are some common mappings:
| OpenRouter Model | LLM Gateway Model |
| -------------------------------- | ----------------------------------------------------------------- |
| openai/gpt-5.2 | gpt-5.2 or openai/gpt-5.2 |
| gemini/gemini-3-flash-preview | gemini-3-flash-preview or google-ai-studio/gemini-3-flash-preview |
| bedrock/claude-opus-4-5-20251101 | claude-opus-4-5-20251101 or aws-bedrock/claude-opus-4-5-20251101 |
Check the [models page](https://llmgateway.io/models) for the full list of available models.
## Streaming Support
LLM Gateway supports streaming responses identically to OpenRouter:
```typescript
const stream = await client.chat.completions.create({
model: "anthropic/claude-3-5-sonnet-20241022",
messages: [{ role: "user", content: "Write a story" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
```
## Full Comparison
Want to see a detailed breakdown of all features? Check out our [LLM Gateway vs OpenRouter comparison page](https://llmgateway.io/compare/open-router).
## Need Help?
* Browse available models at [llmgateway.io/models](https://llmgateway.io/models)
* Read the [API documentation](https://docs.llmgateway.io)
* Contact support at [contact@llmgateway.io](mailto:contact@llmgateway.io)
# Migrate from Vercel AI Gateway
URL: /migrations/vercel-ai-gateway
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
## Quick Migration
Swap your provider importsβyour AI SDK code stays the same:
```diff
- import { openai } from "@ai-sdk/openai";
- import { anthropic } from "@ai-sdk/anthropic";
+ import { generateText } from "ai";
+ import { createLLMGateway } from "@llmgateway/ai-sdk-provider";
+ const llmgateway = createLLMGateway({
+ apiKey: process.env.LLM_GATEWAY_API_KEY
+ });
const { text } = await generateText({
- model: openai("gpt-5.2"),
+ model: llmgateway("gpt-5.2"),
prompt: "Hello!"
});
```
The key difference: one provider, one API key, all modelsβwith caching and analytics built in.
## Migration Steps
### Get Your LLM Gateway API Key
Sign up at [llmgateway.io/signup](https://llmgateway.io/signup) and create an API key from your dashboard.
### Install the LLM Gateway AI SDK Provider
Install the native LLM Gateway provider for the Vercel AI SDK:
```bash
pnpm add @llmgateway/ai-sdk-provider
```
This package provides full compatibility with the Vercel AI SDK and supports all LLM Gateway features.
### Update Your Code
#### Basic Text Generation
```typescript
// Before (Vercel AI Gateway with native providers)
import { openai } from "@ai-sdk/openai";
import { anthropic } from "@ai-sdk/anthropic";
import { generateText } from "ai";
const { text: openaiText } = await generateText({
model: openai("gpt-4o"),
prompt: "Hello!",
});
const { text: claudeText } = await generateText({
model: anthropic("claude-3-5-sonnet-20241022"),
prompt: "Hello!",
});
// After (LLM Gateway - single provider for all models)
import { createLLMGateway } from "@llmgateway/ai-sdk-provider";
import { generateText } from "ai";
const llmgateway = createLLMGateway({
apiKey: process.env.LLM_GATEWAY_API_KEY,
});
const { text: openaiText } = await generateText({
model: llmgateway("openai/gpt-4o"),
prompt: "Hello!",
});
const { text: claudeText } = await generateText({
model: llmgateway("anthropic/claude-3-5-sonnet-20241022"),
prompt: "Hello!",
});
```
#### Streaming Responses
```typescript
import { createLLMGateway } from "@llmgateway/ai-sdk-provider";
import { streamText } from "ai";
const llmgateway = createLLMGateway({
apiKey: process.env.LLM_GATEWAY_API_KEY,
});
const { textStream } = await streamText({
model: llmgateway("anthropic/claude-3-5-sonnet-20241022"),
prompt: "Write a poem about coding",
});
for await (const text of textStream) {
process.stdout.write(text);
}
```
#### Using in Next.js API Routes
```typescript
// app/api/chat/route.ts
import { createLLMGateway } from "@llmgateway/ai-sdk-provider";
import { streamText } from "ai";
const llmgateway = createLLMGateway({
apiKey: process.env.LLM_GATEWAY_API_KEY,
});
export async function POST(req: Request) {
const { messages } = await req.json();
const result = await streamText({
model: llmgateway("openai/gpt-4o"),
messages,
});
return result.toDataStreamResponse();
}
```
#### Alternative: Using OpenAI SDK Adapter
If you prefer not to install a new package, you can use `@ai-sdk/openai` with a custom base URL:
```typescript
import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";
const llmgateway = createOpenAI({
baseURL: "https://api.llmgateway.io/v1",
apiKey: process.env.LLM_GATEWAY_API_KEY,
});
const { text } = await generateText({
model: llmgateway("openai/gpt-4o"),
prompt: "Hello!",
});
```
### Update Environment Variables
```bash
# Remove individual provider keys (optional - can keep as backup)
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...
# Add LLM Gateway key
export LLM_GATEWAY_API_KEY=llmgtwy_your_key_here
```
## Model Name Format
LLM Gateway supports two model ID formats:
**Root Model IDs** (without provider prefix) - Uses smart routing to automatically select the best provider based on uptime, throughput, price, and latency:
```
gpt-4o
claude-3-5-sonnet-20241022
gemini-1.5-pro
```
**Provider-Prefixed Model IDs** - Routes to a specific provider with automatic failover if uptime drops below 90%:
```
openai/gpt-4o
anthropic/claude-3-5-sonnet-20241022
google-ai-studio/gemini-1.5-pro
```
For more details on routing behavior, see the [routing documentation](/features/routing).
### Model Mapping Examples
| Vercel AI SDK | LLM Gateway |
| ----------------------------------------- | -------------------------------------------------------------------------------------------------- |
| `openai("gpt-4o")` | `llmgateway("gpt-4o")` or `llmgateway("openai/gpt-4o")` |
| `anthropic("claude-3-5-sonnet-20241022")` | `llmgateway("claude-3-5-sonnet-20241022")` or `llmgateway("anthropic/claude-3-5-sonnet-20241022")` |
| `google("gemini-1.5-pro")` | `llmgateway("gemini-1.5-pro")` or `llmgateway("google-ai-studio/gemini-1.5-pro")` |
Check the [models page](https://llmgateway.io/models) for the full list of available models.
## Tool Calling
LLM Gateway supports tool calling through the AI SDK:
```typescript
import { createLLMGateway } from "@llmgateway/ai-sdk-provider";
import { generateText, tool } from "ai";
import { z } from "zod";
const llmgateway = createLLMGateway({
apiKey: process.env.LLM_GATEWAY_API_KEY,
});
const { text, toolResults } = await generateText({
model: llmgateway("openai/gpt-4o"),
tools: {
weather: tool({
description: "Get the weather for a location",
parameters: z.object({
location: z.string(),
}),
execute: async ({ location }) => {
return { temperature: 72, condition: "sunny" };
},
}),
},
prompt: "What's the weather in San Francisco?",
});
```
## Self-Hosting LLM Gateway
If you prefer self-hosting, LLM Gateway is available under AGPLv3:
```bash
git clone https://github.com/llmgateway/llmgateway
cd llmgateway
pnpm install
pnpm setup
pnpm dev
```
This gives you the same managed experience with full control over your infrastructure.
## Need Help?
* Browse available models at [llmgateway.io/models](https://llmgateway.io/models)
* Read the [API documentation](https://docs.llmgateway.io)
* Contact support at [contact@llmgateway.io](mailto:contact@llmgateway.io)
# Rate Limits
URL: /resources/rate-limits
import { Callout } from "fumadocs-ui/components/callout";
# Rate Limits
LLMGateway implements rate limits to ensure fair usage and optimal performance for all users. The rate limits differ based on your account status and the type of models you're using.
## Free Models
Free models (models with zero input and output pricing) have rate limits that depend on your account's credit status:
### Base Rate Limits
For organizations with **zero credits**:
* **5 requests per 10 minutes**
* Applies to all free model requests
* Resets every 10 minutes
### Elevated Rate Limits
For organizations that have **purchased at least some credits**:
* **20 requests per minute**
* Applies to all free model requests
* Resets every minute
When using free models with elevated limits, your credits will **not** be
deducted. The elevated rate limits are simply a benefit for users who have
added credits to their account.
## Paid Models
**Paid AI models are not currently rate limited.** You can make as many requests as needed to paid models, subject only to your account's credit balance and any provider-specific limits.
## Rate Limit Headers
All API responses include rate limit information in the headers:
```http
X-RateLimit-Limit: 20
X-RateLimit-Remaining: 19
X-RateLimit-Reset: 1640995200
```
* `X-RateLimit-Limit`: Maximum number of requests allowed in the current window
* `X-RateLimit-Remaining`: Number of requests remaining in the current window
* `X-RateLimit-Reset`: Unix timestamp when the rate limit window resets
## Rate Limit Exceeded
When you exceed your rate limit, you'll receive a `429 Too Many Requests` response:
```json
{
"error": {
"message": "Rate limit exceeded. Try again later.",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
}
```
## Best Practices
### Upgrading Your Limits
To unlock elevated rate limits for free models:
1. Add credits to your account through the dashboard
2. Your rate limits will automatically increase to 20 requests per minute
3. Free model usage will still not deduct from your credits
### Handling Rate Limits
* Implement exponential backoff when you receive 429 responses
* Monitor the `X-RateLimit-Remaining` header to avoid hitting limits
* Consider using paid models for high-volume applications
### Cost Optimization
* Use free models for development and testing
* Switch to paid models for production workloads requiring higher throughput
* Monitor your usage patterns through the dashboard
Adding even a small amount of credits to your account (e.g., $5) will
immediately upgrade your free model rate limits from 5 requests per 10 minutes
to 20 requests per minute.