# Introduction
URL: /
import { FeatureCards } from "@/components/feature-cards";
import { AIToolingCards } from "@/components/ai-tooling-cards";

LLM Gateway is an open-source API gateway that sits between your applications and LLM providers like OpenAI, Anthropic, Google AI Studio, and more. It provides a unified, OpenAI-compatible API interface with built-in cost tracking, caching, and intelligent routing.

## Features

<FeatureCards />

## AI Tooling

LLM Gateway is built to work seamlessly with AI agents and development tools.

<AIToolingCards />

## Next Steps

* [**Quickstart**](/quick-start) — Get up and running in minutes
* [**Overview**](/overview) — Learn more about what LLM Gateway offers
* [**Self-Host**](/self-host) — Deploy on your own infrastructure


# Overview
URL: /overview
# LLM Gateway

LLM Gateway is an open-source API gateway for Large Language Models (LLMs). It acts as a middleware between your applications and various LLM providers, allowing you to:

* Route requests to multiple LLM providers (OpenAI, Anthropic, Google AI Studio, and others)
* Manage API keys for different providers in one place
* Track token usage and costs across all your LLM interactions
* Analyze performance metrics to optimize your LLM usage

## Analyzing Your LLM Requests

LLM Gateway provides detailed insights into your LLM usage:

* **Usage Metrics**: Track the number of requests, tokens used, and response times
* **Cost Analysis**: Monitor spending across different models and providers
* **Performance Tracking**: Identify patterns and optimize your prompts based on actual usage data
* **Breakdown by Model**: Compare different models' performance and cost-effectiveness

All this data is automatically collected and presented in an intuitive dashboard, helping you make informed decisions about your LLM strategy.

## Getting Started

Using LLM Gateway is simple. Just swap out your current LLM provider URL with the LLM Gateway API endpoint:

```bash
curl -X POST https://api.llmgateway.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -d '{
  "model": "gpt-4o",
  "messages": [
    {"role": "user", "content": "Hello, how are you?"}
  ]
}'
```

LLM Gateway maintains compatibility with the OpenAI API format, making migration seamless.

## Hosted vs. Self-Hosted

You can use LLM Gateway in two ways:

* **Hosted Version**: For immediate use without setup, visit [llmgateway.io](https://llmgateway.io) to create an account and get an API key.
* **Self-Hosted**: Deploy LLM Gateway on your own infrastructure for complete control over your data and configuration.

The self-hosted version offers additional customization options and ensures your LLM traffic never leaves your infrastructure if desired.


# Quickstart
URL: /quick-start
import { Accordion, Accordions } from "fumadocs-ui/components/accordion";
import { Tabs, Tab } from "fumadocs-ui/components/tabs";
import { DynamicCodeBlock } from "fumadocs-ui/components/dynamic-codeblock";

# 🚀 Quickstart

Welcome to **LLM Gateway**—a single drop‑in endpoint that lets you call today’s best large‑language models while keeping **your existing code** and development workflow intact.

> **TL;DR** — Point your HTTP requests to `https://api.llmgateway.io/v1/…`, supply your `LLM_GATEWAY_API_KEY`, and you’re done.

***

## 1 · Get an API key

1. Sign in to the dashboard.
2. Create a new Project → *Copy the key*.
3. Export it in your shell (or a `.env` file):

```bash
export LLM_GATEWAY_API_KEY="llmgtwy_XXXXXXXXXXXXXXXX"
```

***

## 2 · Pick your language

<Tabs
  groupId="language"
  items={[
'cURL',
'TypeScript',
'React',
'Next.js',
'Python',
'Java',
'Rust',
'Go',
'PHP',
'Ruby']}
  persist
>
  <Tab value="cURL">
    <DynamicCodeBlock
      lang="bash"
      code={`curl -X POST https://api.llmgateway.io/v1/chat/completions \\
-H "Content-Type: application/json" \\
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \\
-d '{
"model": "gpt-4o",
"messages": [
  {"role": "user", "content": "Hello, how are you?"}
]
}'`}
    />
  </Tab>

  <Tab value="TypeScript">
    <DynamicCodeBlock
      lang="typescript"
      code={`const response = await fetch('https://api.llmgateway.io/v1/chat/completions', {
method: 'POST',
headers: {
  'Content-Type': 'application/json',
  'Authorization': \`Bearer \${process.env.LLM_GATEWAY_API_KEY}\`
},
body: JSON.stringify({
  model: 'gpt-4o',
  messages: [
    { role: 'user', content: 'Hello, how are you?' }
  ]
})
});

if (!response.ok) {
throw new Error(\`HTTP error! status: \${response.status}\`);
}

const data = await response.json();
console.log(data.choices[0].message.content);`}
    />
  </Tab>

  <Tab value="React">
    <DynamicCodeBlock
      lang="tsx"
      code={`import { useState } from 'react'

function ChatComponent() {
const [response, setResponse] = useState('');
const [loading, setLoading] = useState(false);

const sendMessage = async () => {
setLoading(true);
try {
const res = await fetch('https://api.llmgateway.io/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': \`Bearer \${process.env.REACT_APP_LLM_GATEWAY_API_KEY}\`
},
body: JSON.stringify({
model: 'gpt-4o',
messages: [
{ role: 'user', content: 'Hello, how are you?' }
]
})
});

    if (!res.ok) {
      throw new Error(\`HTTP error! status: \${res.status}\`);
    }

    const data = await res.json();
    setResponse(data.choices[0].message.content);
  } catch (error) {
    console.error('Error:', error);
  } finally {
    setLoading(false);
  }

};

return (

<div>
  <button onClick={sendMessage} disabled={loading}>
  	{loading ? "Sending..." : "Send Message"}
  </button>
  {response && <p>{response}</p>}
</div>
); }

export default ChatComponent;
`}
    />
  </Tab>

  <Tab value="Next.js">
    <DynamicCodeBlock
      lang="typescript"
      code={`; // app/api/chat/route.ts
import { NextRequest, NextResponse } from "next/server";

export async function POST(request: NextRequest) {
const { message } = await request.json();

const response = await fetch('https://api.llmgateway.io/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': \`Bearer \${process.env.LLM_GATEWAY_API_KEY}\`
},
body: JSON.stringify({
model: 'gpt-4o',
messages: [
{ role: 'user', content: message }
]
})
});

if (!response.ok) {
return NextResponse.json({ error: 'Failed to get response' }, { status: response.status });
}

const data = await response.json();

return NextResponse.json({
message: data.choices[0].message.content
});
}

// Usage in component:
// const response = await fetch('/api/chat', {
// method: 'POST',
// headers: { 'Content-Type': 'application/json' },
// body: JSON.stringify({ message: 'Hello, how are you?' })
// });`}
    />
  </Tab>

  <Tab value="Python">
    <DynamicCodeBlock
      lang="python"
      code={`import requests
import os

response = requests.post(
'https://api.llmgateway.io/v1/chat/completions',
headers={
'Content-Type': 'application/json',
'Authorization': f'Bearer {os.getenv("LLM_GATEWAY_API_KEY")}'
},
json={
'model': 'gpt-4o',
'messages': [
{'role': 'user', 'content': 'Hello, how are you?'}
]
}
)

response.raise_for_status()
print(response.json()['choices'][0]['message']['content'])`}
    />
  </Tab>

  <Tab value="Java">
    <DynamicCodeBlock
      lang="java"
      code={`import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.net.URI;

String apiKey = System.getenv("LLM_GATEWAY_API_KEY");
String requestBody = """
{
\"model\": \"gpt-4o\",
\"messages\": [
{\"role\": \"user\", \"content\": \"Hello, how are you?\"}
]
}
""";

HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("https://api.llmgateway.io/v1/chat/completions"))
.header("Content-Type", "application/json")
.header("Authorization", "Bearer " + apiKey)
.POST(HttpRequest.BodyPublishers.ofString(requestBody))
.build();

HttpResponse<String> response = HttpClient.newHttpClient()
.send(request, HttpResponse.BodyHandlers.ofString());

System.out.println(response.body());`}
    />
  </Tab>

  <Tab value="Rust">
    <DynamicCodeBlock
      lang="rust"
      code={`use reqwest::Client;
use serde_json::json;
use std::env;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = Client::new();
let api_key = env::var("LLM_GATEWAY_API_KEY")?;

  let response = client
      .post("https://api.llmgateway.io/v1/chat/completions")
      .header("Content-Type", "application/json")
      .header("Authorization", format!("Bearer {}", api_key))
      .json(&json!({
          "model": "gpt-4o",
          "messages": [
              {"role": "user", "content": "Hello, how are you?"}
          ]
      }))
      .send()
      .await?;

  let result: serde_json::Value = response.json().await?;
  println!("{}", result["choices"][0]["message"]["content"]);
  Ok(())

}`}
    />
  </Tab>

  <Tab value="Go">
    <DynamicCodeBlock
      lang="go"
      code={`package main

import (
  "bytes"
  "encoding/json"
  "fmt"
  "net/http"
  "os"
)

type ChatRequest struct {
Model string ` + "`json:\"model\"`" + `
Messages []Message ` + "`json:\"messages\"`" + `
}

type Message struct {
Role string ` + "`json:\"role\"`" + `
Content string ` + "`json:\"content\"`" + `
}

func main() {
apiKey := os.Getenv("LLM_GATEWAY_API_KEY")

  requestBody := ChatRequest{
      Model: "gpt-4o",
      Messages: []Message{{Role: "user", Content: "Hello, how are you?"}},
  }

  jsonData, _ := json.Marshal(requestBody)

  req, _ := http.NewRequest("POST", "https://api.llmgateway.io/v1/chat/completions", bytes.NewBuffer(jsonData))
  req.Header.Set("Content-Type", "application/json")
  req.Header.Set("Authorization", "Bearer "+apiKey)

  client := &http.Client{}
  resp, _ := client.Do(req)
  defer resp.Body.Close()

  fmt.Println("Response received")

}`}
    />
  </Tab>

  <Tab value="PHP">
    <DynamicCodeBlock
      lang="php"
      code={`<?php
$apiKey = $_ENV['LLM_GATEWAY_API_KEY'];

$data = [
'model' => 'gpt-4o',
'messages' => [
['role' => 'user', 'content' => 'Hello, how are you?']
]
];

$options = [
  'http' => [
      'header' => [
          'Content-Type: application/json',
          'Authorization: Bearer ' . $apiKey
      ],
      'method' => 'POST',
      'content' => json_encode($data)
]
];

$context = stream_context_create($options);
$response = file_get_contents(
'https://api.llmgateway.io/v1/chat/completions',
false,
$context
);

if ($response === FALSE) {
throw new Exception('Request failed');
}

$result = json_decode($response, true);
echo $result['choices'][0]['message']['content'];
?>`}
    />
  </Tab>

  <Tab value="Ruby">
    <DynamicCodeBlock
      lang="ruby"
      code={`require 'net/http'
require 'json'
require 'uri'

uri = URI('https://api.llmgateway.io/v1/chat/completions')
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true

request = Net::HTTP::Post.new(uri)
request['Content-Type'] = 'application/json'
request['Authorization'] = "Bearer #{ENV['LLM_GATEWAY_API_KEY']}"

request.body = {
model: 'gpt-4o',
messages: [
{ role: 'user', content: 'Hello, how are you?' }
]
}.to_json

response = http.request(request)

if response.code != '200'
raise "HTTP Error: #{response.code}"
end

result = JSON.parse(response.body)
puts result['choices'][0]['message']['content']`}
    />
  </Tab>
</Tabs>

***

## 3 · SDK integrations

```ts title="ai-sdk.ts"
import { llmgateway } from "@llmgateway/ai-sdk-provider";
import { generateText } from "ai";

const { text } = await generateText({
	model: llmgateway("gpt-4o"),
	prompt: "Write a vegetarian lasagna recipe for 4 people.",
});
```

```ts title="vercel-ai-sdk.ts"
import { createOpenAI } from "@ai-sdk/openai";

const llmgateway = createOpenAI({
	baseURL: "https://api.llmgateway.io/v1",
	apiKey: process.env.LLM_GATEWAY_API_KEY!,
});

const completion = await llmgateway.chat({
	model: "gpt-4o",
	messages: [{ role: "user", content: "Hello, how are you?" }],
});

console.log(completion.choices[0].message.content);
```

```ts title="openai-sdk.ts"
import OpenAI from "openai";

const openai = new OpenAI({
	baseURL: "https://api.llmgateway.io/v1",
	apiKey: process.env.LLM_GATEWAY_API_KEY,
});

const completion = await openai.chat.completions.create({
	model: "gpt-4o",
	messages: [{ role: "user", content: "Hello, how are you?" }],
});

console.log(completion.choices[0].message.content);
```

***

## 4 · Going further

* **Streaming**: pass `stream: true` to any request—Gateway will proxy the event stream unchanged.
* **Monitoring**: Every call appears in the dashboard with latency, cost & provider breakdown.

***

## 5 · FAQ

<Accordions type="single">
  <Accordion title="Which models are supported?">
    See the [Models page](https://llmgateway.io/models).
  </Accordion>

  <Accordion title="What makes your service different from OpenRouter?">
    <p>
      Unlike OpenRouter, we offer:
    </p>

    <ul className="list-disc pl-6 mt-2 space-y-1">
      <li>
        Full self-hosting capabilities, giving you complete control over your
        infrastructure
      </li>

      <li>
        Enhanced analytics with deeper insights into your model usage and
        performance
      </li>

      <li>
        No fees when using your own provider keys, maximizing cost efficiency
      </li>

      <li>
        Greater flexibility and customization options for enterprise deployments
      </li>
    </ul>
  </Accordion>

  <Accordion title="How much do you charge for your services?">
    Our pricing structure is designed to be flexible and cost-effective: See the
    [Pricing section](https://llmgateway.io#pricing).
  </Accordion>
</Accordions>

***

## 6 · Next steps

* Read [Self host docs](/self-host) guide.
* Drop into our [GitHub](https://github.com/theopenco/llmgateway) for help or feature requests.

Happy building! ✨


# Self Host LLMGateway
URL: /self-host
# Self Host LLMGateway

LLMGateway is a self-hostable platform that provides a unified API gateway for multiple LLM providers. This guide offers two simple options to get started.

## Prerequisites

* Latest Docker
* API keys for the LLM providers you want to use (OpenAI, Anthropic, etc.)

## Option 1: Unified Docker Image (Simplest)

This option uses a single Docker container that includes all services (UI, API, Gateway, Database, Redis).

```bash
# Run the container
docker run -d \
  --name llmgateway \
  --restart unless-stopped \
  -p 3002:3002 \
  -p 3003:3003 \
  -p 3005:3005 \
  -p 3006:3006 \
  -p 4001:4001 \
  -p 4002:4002 \
  -v ~/llmgateway_data:/var/lib/postgresql/data \
  -e AUTH_SECRET=your-secret-key-here \
  ghcr.io/theopenco/llmgateway-unified:latest
```

Note: it is recommended to use the latest version tag from here instead of `latest`: [https://github.com/theopenco/llmgateway/releases](https://github.com/theopenco/llmgateway/releases)

### Using Docker Compose (Alternative for unified image)

```bash
# Download the compose file
curl -O https://raw.githubusercontent.com/theopenco/llmgateway/main/infra/docker-compose.unified.yml
curl -O https://raw.githubusercontent.com/theopenco/llmgateway/main/.env.example

# Configure environment
cp .env.example .env
# Edit .env with your configuration

# Start the service
docker compose -f docker-compose.unified.yml up -d
```

Note: it is recommended to replace the `latest` version tag in the image with the latest version from here: [https://github.com/theopenco/llmgateway/releases](https://github.com/theopenco/llmgateway/releases)

## Option 2: Separate Services with Docker Compose

This option uses separate containers for each service, offering more flexibility.

```bash
# Clone the repository
git clone https://github.com/theopenco/llmgateway.git
cd llmgateway

# Configure environment
cp .env.example .env
# Edit .env with your configuration

# Start the services
docker compose -f infra/docker-compose.split.yml up -d
```

Note: it is recommended to replace the `latest` version tag in all images in the compose file with the latest version from here: [https://github.com/theopenco/llmgateway/releases](https://github.com/theopenco/llmgateway/releases)

## Accessing Your LLMGateway

After starting either option, you can access:

* **Web Interface**: [http://localhost:3002](http://localhost:3002)
* **Documentation**: [http://localhost:3005](http://localhost:3005)
* **API Endpoint**: [http://localhost:4002](http://localhost:4002)
* **Gateway Endpoint**: [http://localhost:4001](http://localhost:4001)

## Required Configuration

At minimum, you need to set these environment variables:

```bash
# Database (change the password!)
POSTGRES_PASSWORD=your_secure_password_here

# Authentication
AUTH_SECRET=your-secret-key-here

# LLM Provider API Keys (add the ones you need)
LLM_OPENAI_API_KEY=sk-...
LLM_ANTHROPIC_API_KEY=sk-ant-...
```

## Basic Management Commands

### For Unified Docker (Option 1)

```bash
# View logs
docker logs llmgateway

# Restart container
docker restart llmgateway

# Stop container
docker stop llmgateway
```

### For Docker Compose (Option 2)

```bash
# View logs
docker compose -f infra/docker-compose.split.yml logs -f

# Restart services
docker compose -f infra/docker-compose.split.yml restart

# Stop services
docker compose -f infra/docker-compose.split.yml down
```

## Build locally

To build locally, you can use the \*.local.yml compose file in the `infra` directory, which will build the images from the source code.

## All provider API keys

You can set any of the following API keys:

```text
LLM_OPENAI_API_KEY=
LLM_ANTHROPIC_API_KEY=
```

## Multiple API Keys and Load Balancing

LLMGateway supports multiple API keys per provider for load balancing and increased availability. Simply provide comma-separated values for your API keys:

```bash
# Multiple OpenAI keys for load balancing
LLM_OPENAI_API_KEY=sk-key1,sk-key2,sk-key3

# Multiple Anthropic keys
LLM_ANTHROPIC_API_KEY=sk-ant-key1,sk-ant-key2
```

### Health-Aware Routing

The gateway automatically tracks the health of each API key and routes requests to healthy keys. If a key experiences consecutive errors, it will be temporarily skipped. Keys that return authentication errors (401/403) are permanently blacklisted until restart.

### Related Configuration Values

For providers that require additional configuration (like Google Vertex), you can specify multiple values that correspond to each API key. The gateway will always use the matching index:

```bash
# Multiple Google Vertex configurations
LLM_GOOGLE_VERTEX_API_KEY=key1,key2,key3
LLM_GOOGLE_CLOUD_PROJECT=project-a,project-b,project-c
LLM_GOOGLE_VERTEX_REGION=us-central1,europe-west1,asia-east1
```

When the gateway selects `key2`, it will automatically use `project-b` and `europe-west1`. If you have fewer configuration values than keys, the last value will be reused for remaining keys.

## Next Steps

Once your LLMGateway is running:

1. **Open the web interface** at [http://localhost:3002](http://localhost:3002)
2. **Create your first organization** and project
3. **Generate API keys** for your applications
4. **Test the gateway** by making API calls to [http://localhost:4001](http://localhost:4001)


# Health check
URL: /health
{/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */}

Health check endpoint.

<APIPage document={"./openapi.json"} operations={[{"path":"/","method":"get"}]} webhooks={[]} hasHead={false} />


# Chat Completions
URL: /v1_chat_completions
{/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */}

Create a completion for the chat conversation

<APIPage document={"./openapi.json"} operations={[{"path":"/v1/chat/completions","method":"post"}]} webhooks={[]} hasHead={false} />


# Anthropic Messages
URL: /v1_messages
{/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */}

Create a message using Anthropic's API format

<APIPage document={"./openapi.json"} operations={[{"path":"/v1/messages","method":"post"}]} webhooks={[]} hasHead={false} />


# Models
URL: /v1_models
{/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */}

List all available models

<APIPage document={"./openapi.json"} operations={[{"path":"/v1/models","method":"get"}]} webhooks={[]} hasHead={false} />


# Agent Skills
URL: /guides/agent-skills
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";

**Agent Skills** are structured guidelines for AI coding agents, optimized for use with LLM Gateway and the AI SDK. They provide best practices and reusable instructions that help AI agents generate higher-quality code.

## What Are Agent Skills?

Agent Skills are packaged sets of rules and guidelines that teach AI coding agents how to implement specific features correctly. Each skill covers:

* API integration patterns
* Frontend rendering best practices
* Error handling strategies
* Performance optimization techniques

## Available Skills

### Image Generation

The Image Generation skill teaches AI agents how to properly implement image generation features:

* **API Integration** — correctly calling image generation APIs
* **Frontend Rendering** — displaying generated images efficiently
* **Error Handling** — graceful degradation and retry logic
* **Performance** — caching, lazy loading, and optimization

## Installation

<Steps>
  <Step>
    ### Prerequisites

    Ensure you have Node.js 18+ and pnpm 9+ installed:

    ```bash
    node --version  # v18.0.0 or higher
    pnpm --version  # 9.0.0 or higher
    ```
  </Step>

  <Step>
    ### Clone the Repository

    ```bash
    git clone https://github.com/theopenco/agent-skills.git
    cd agent-skills
    ```
  </Step>

  <Step>
    ### Install Dependencies

    ```bash
    pnpm install
    ```
  </Step>

  <Step>
    ### Build Skills

    Build all skills to generate the documentation:

    ```bash
    pnpm build:all
    ```

    Or build a specific skill:

    ```bash
    pnpm build
    ```
  </Step>
</Steps>

## Using Skills in Your Project

After building, each skill generates an `AGENTS.md` file that can be used with AI coding agents like Claude, Cursor, or Copilot.

### With Claude Code

Add the generated `AGENTS.md` content to your project's `CLAUDE.md` file:

```bash
cat skills/image-generation/AGENTS.md >> CLAUDE.md
```

### With Cursor

Add the skill content to your `.cursorrules` file:

```bash
cat skills/image-generation/AGENTS.md >> .cursorrules
```

### With Other AI Agents

Most AI coding tools support custom instructions. Copy the skill content into your tool's configuration.

## Project Structure

```
agent-skills/
├── packages/
│   └── skills-build/          # Build tooling
├── skills/
│   └── image-generation/      # Individual skill
│       ├── rules/             # Rule files
│       ├── AGENTS.md          # Generated documentation
│       └── metadata.json      # Skill metadata
└── package.json
```

## Contributing

### Adding New Rules

<Steps>
  <Step>
    ### Fork and Clone

    Fork the repository and create a feature branch:

    ```bash
    git checkout -b feat/new-rule
    ```
  </Step>

  <Step>
    ### Create a Rule File

    Rules follow a standardized template with YAML frontmatter containing `title`, `impact` (high/medium/low), and `tags`. The body includes sections for Context, Incorrect examples, and Correct examples with TypeScript code blocks.

    See existing rules in `skills/image-generation/rules/` for reference.
  </Step>

  <Step>
    ### Validate and Build

    ```bash
    pnpm validate
    pnpm build:all
    ```
  </Step>

  <Step>
    ### Submit a Pull Request

    Push your changes and open a PR.
  </Step>
</Steps>

### Impact Levels

When creating rules, use these impact levels:

* **high** — Critical for correctness or security
* **medium** — Important for quality and maintainability
* **low** — Nice-to-have improvements

## Development Commands

| Command          | Description                 |
| ---------------- | --------------------------- |
| `pnpm install`   | Install dependencies        |
| `pnpm build:all` | Build all skills            |
| `pnpm build`     | Build a specific skill      |
| `pnpm validate`  | Validate rule files         |
| `pnpm dev`       | Development mode with watch |

## More Resources

* [GitHub Repository](https://github.com/theopenco/agent-skills) — Source code and contributions
* [LLM Gateway CLI](/guides/cli) — Project scaffolding tool
* [Templates](https://llmgateway.io/templates) — Production-ready starter projects

<Callout type="info">
  Want to contribute a new skill or rule? Check out the [contribution
  guidelines](https://github.com/theopenco/agent-skills#contributing) on GitHub.
</Callout>


# Autohand Integration
URL: /guides/autohand
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";

Autohand is an autonomous AI coding agent that works in your terminal, IDE, and Slack. With LLM Gateway, you can route all Autohand requests through a single gateway—use any of 180+ models from 60+ providers, with full cost tracking and smart routing.

## Setup

<Steps>
  <Step>
    ### Sign Up for LLM Gateway

    [Sign up free](https://llmgateway.io/signup) — no credit card required. Copy your API key from the dashboard.
  </Step>

  <Step>
    ### Set Environment Variables

    Configure Autohand to use LLM Gateway:

    ```bash
    export OPENAI_BASE_URL=https://api.llmgateway.io/v1
    export OPENAI_API_KEY=llmgtwy_your_api_key_here
    ```
  </Step>

  <Step>
    ### Run Autohand

    ```bash
    autohand
    ```

    All requests will now be routed through LLM Gateway.
  </Step>
</Steps>

## Why Use LLM Gateway with Autohand

* **180+ models** — GPT-5, Claude Opus, Gemini, Llama, and more from 60+ providers
* **Smart routing** — Automatically selects the best provider based on uptime, throughput, price, and latency
* **Cost tracking** — Monitor exactly how much each autonomous session costs
* **Single bill** — No need to manage multiple API provider accounts
* **Response caching** — Repeated requests hit cache automatically
* **Automatic failover** — If one provider is down, requests route to another

## Configuration File

You can also configure LLM Gateway in Autohand's config file:

```json
{
	"provider": {
		"llmgateway": {
			"baseUrl": "https://api.llmgateway.io/v1",
			"apiKey": "llmgtwy_your_api_key_here"
		}
	},
	"model": "gpt-5"
}
```

## Choosing Models

You can use any model from the [models page](https://llmgateway.io/models).

| Model               | Best For                                    |
| ------------------- | ------------------------------------------- |
| `gpt-5`             | Latest OpenAI flagship, highest quality     |
| `claude-opus-4-6`   | Anthropic's most capable model              |
| `claude-sonnet-4-6` | Fast reasoning with extended thinking       |
| `gemini-2.5-pro`    | Google's latest flagship, 1M context window |
| `o3`                | Advanced reasoning tasks                    |
| `gpt-5-mini`        | Cost-effective, quick responses             |
| `gemini-2.5-flash`  | Fast responses, good for high-volume        |
| `deepseek-v3.1`     | Open-source with vision and tools           |

## Autohand Features with LLM Gateway

### Terminal (CLI)

Autohand CLI works seamlessly with LLM Gateway. Set the environment variables and use all Autohand commands as normal—multi-file editing, agentic search, and autonomous code generation all work out of the box.

### IDE Integration

Autohand's VS Code and Zed extensions respect the same environment variables. Set them in your shell profile and the IDE integration will automatically route through LLM Gateway.

### Slack Integration

When using Autohand through Slack, configure the LLM Gateway base URL in your Autohand server settings to route all Slack-triggered coding tasks through the gateway.

## Monitoring Usage

Once configured, all Autohand requests appear in your LLM Gateway dashboard:

* **Request logs** — See every prompt and response
* **Cost breakdown** — Track spending by model and time period
* **Usage analytics** — Understand your AI usage patterns

<Callout type="info">
  View all available models on the [models page](https://llmgateway.io/models).
</Callout>

<Callout type="info">
  Need help? Join our [Discord community](https://llmgateway.io/discord) for
  support and troubleshooting assistance.
</Callout>


# Claude Code Integration
URL: /guides/claude-code
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";

Claude Code is locked to Anthropic's API by default. With LLM Gateway, you can point it at any model—GPT-5, Gemini, Llama, or 180+ others—while keeping the same Anthropic API format Claude Code expects.

Three environment variables. No code changes. Full cost tracking in your dashboard.

## Setup

<Steps>
  <Step>
    ### Sign Up for LLM Gateway

    [Sign up free](https://llmgateway.io/signup) — no credit card required. Copy your API key from the dashboard.
  </Step>

  <Step>
    ### Set Environment Variables

    Configure Claude Code to use LLM Gateway:

    ```bash
    export ANTHROPIC_BASE_URL=https://api.llmgateway.io
    export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here
    # optional: specify a model, otherwise it uses the default Claude model
    export ANTHROPIC_MODEL=gpt-5  # or any model from our catalog
    ```
  </Step>

  <Step>
    ### Run Claude Code

    ```bash
    claude
    ```

    All requests will now be routed through LLM Gateway.
  </Step>
</Steps>

## Why This Works

LLM Gateway's `/v1/messages` endpoint speaks Anthropic's API format natively. We handle the translation to each provider behind the scenes. This means:

* **Use any model** — GPT-5, Gemini, Llama, or Claude itself
* **Keep your workflow** — Claude Code doesn't know the difference
* **Track costs** — Every request appears in your LLM Gateway dashboard
* **Automatic caching** — Repeated requests hit cache, saving money

## Choosing Models

You can use any model from the [models page](https://llmgateway.io/models).

### Use OpenAI's Latest Models

```bash
# Use the latest GPT model
export ANTHROPIC_MODEL=gpt-5

# Use a cost-effective alternative
export ANTHROPIC_MODEL=gpt-5-mini
```

### Use Google's Gemini

```bash
export ANTHROPIC_MODEL=gemini-2.5-pro
```

### Use Anthropic's Claude Models

```bash
export ANTHROPIC_MODEL=anthropic/claude-3-5-sonnet-20241022
```

## Environment Variables

### ANTHROPIC\_MODEL

Specifies the main model to use for primary requests.

```bash
export ANTHROPIC_MODEL=gpt-5
```

### Complete Configuration Example

```bash
export ANTHROPIC_BASE_URL=https://api.llmgateway.io
export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here
export ANTHROPIC_MODEL=gpt-5
export ANTHROPIC_SMALL_FAST_MODEL=gpt-5-nano
```

## Making Manual API Requests

If you want to test the endpoint directly, you can make manual requests:

```bash
curl -X POST "https://api.llmgateway.io/v1/messages" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ],
    "max_tokens": 100
  }'
```

### Response Format

The endpoint returns responses in Anthropic's message format:

```json
{
	"id": "msg_abc123",
	"type": "message",
	"role": "assistant",
	"model": "gpt-5",
	"content": [
		{
			"type": "text",
			"text": "Hello! I'm doing well, thank you for asking. How can I help you today?"
		}
	],
	"stop_reason": "end_turn",
	"stop_sequence": null,
	"usage": {
		"input_tokens": 13,
		"output_tokens": 20
	}
}
```

## What You Get

* **Any model in Claude Code** — GPT-5 for heavy lifting, GPT-4o Mini for routine tasks
* **Cost visibility** — See exactly what each coding session costs
* **One bill** — Stop managing separate accounts for OpenAI, Anthropic, Google
* **Response caching** — Repeated requests (like linting the same file) hit cache
* **Discounts** — Check [discounted models](https://llmgateway.io/models?discounted=true) for savings up to 90%

<Callout type="info">
  View all available models on the [models page](https://llmgateway.io/models).
</Callout>

<Callout type="info">
  Need help? Join our [Discord community](https://llmgateway.io/discord) for
  support and troubleshooting assistance.
</Callout>


# LLM Gateway CLI
URL: /guides/cli
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
import { Tabs, Tab } from "fumadocs-ui/components/tabs";

The **LLM Gateway CLI** (`@llmgateway/cli`) is a command-line utility for scaffolding projects, managing AI applications, and discovering models.

## Installation

<Tabs items={["npx (Recommended)", "Global Install"]}>
  <Tab value="npx (Recommended)">
    Run commands directly without installation:

    ```bash
    npx @llmgateway/cli init
    ```
  </Tab>

  <Tab value="Global Install">
    Install globally for faster access:

    ```bash
    npm install -g @llmgateway/cli
    ```

    Then run commands directly:

    ```bash
    llmgateway init
    ```
  </Tab>
</Tabs>

## Quick Start

<Steps>
  <Step>
    ### Initialize a Project

    Create a new project from a template:

    ```bash
    npx @llmgateway/cli init
    ```

    Or specify the template and name directly:

    ```bash
    npx @llmgateway/cli init --template image-generation --name my-ai-app
    ```
  </Step>

  <Step>
    ### Configure Authentication

    Login to save your API key locally:

    ```bash
    npx @llmgateway/cli auth login
    ```

    This opens a browser window to authenticate with LLM Gateway. Your credentials are stored in `~/.llmgateway/config.json`.

    Alternatively, set the `LLMGATEWAY_API_KEY` environment variable which takes precedence over the config file.
  </Step>

  <Step>
    ### Start Development

    Navigate to your project and start the development server:

    ```bash
    cd my-ai-app
    npx @llmgateway/cli dev
    ```

    Or specify a custom port:

    ```bash
    npx @llmgateway/cli dev --port 3000
    ```
  </Step>
</Steps>

## Commands

### `init`

Initialize a new project from a template.

```bash
npx @llmgateway/cli init [options]
```

**Options:**

* `--template <name>` — Template to use (e.g., `image-generation`, `weather-agent`)
* `--name <name>` — Project name

**Examples:**

```bash
# Interactive mode
npx @llmgateway/cli init

# With options
npx @llmgateway/cli init --template image-generation --name my-app
```

### `list`

Display available project templates.

```bash
npx @llmgateway/cli list
```

**Options:**

* `--json` — Output in JSON format

### `models`

Browse and filter available AI models.

```bash
npx @llmgateway/cli models [options]
```

**Options:**

* `--capability <type>` — Filter by capability (e.g., `chat`, `image`, `embedding`)
* `--provider <name>` — Filter by provider (e.g., `openai`, `anthropic`, `google`)
* `--search <term>` — Search models by name

**Examples:**

```bash
# List all models
npx @llmgateway/cli models

# Filter by provider
npx @llmgateway/cli models --provider openai

# Search models
npx @llmgateway/cli models --search gpt
```

### `add`

Add tools or API routes to an existing project.

```bash
npx @llmgateway/cli add
```

**Tools available:**

* `weather` — Weather lookup functionality
* `search` — Web search capability
* `calculator` — Mathematical operations

**API routes available:**

* `generate` — Text generation endpoint
* `chat` — Chat completion endpoint

### `auth`

Manage API authentication.

```bash
# Login via browser
npx @llmgateway/cli auth login

# Check authentication status
npx @llmgateway/cli auth status

# Logout
npx @llmgateway/cli auth logout
```

### `dev`

Start the local development server.

```bash
npx @llmgateway/cli dev [options]
```

**Options:**

* `--port <number>` — Port to run on (default: 3000)

### `upgrade`

Update LLM Gateway dependencies in your project.

```bash
npx @llmgateway/cli upgrade [options]
```

**Options:**

* `--dry-run` — Show what would be updated without making changes

### `docs`

Open the documentation in your browser.

```bash
npx @llmgateway/cli docs
```

## Available Templates

### Image Generation

A full-stack application for AI image generation.

* **Stack:** Next.js 16, React 19, TypeScript
* **Features:** Multi-provider support (DALL-E, Stable Diffusion), unified API
* **Use case:** Image generation apps, creative tools

```bash
npx @llmgateway/cli init --template image-generation
```

### QA Agent

An AI-powered QA testing agent that uses browser automation to test your web app.

* **Stack:** Next.js 16, React 19, TypeScript, Agent Browser
* **Features:** Natural language testing, real-time action timeline, live browser preview
* **Use case:** Automated QA testing, regression testing, user flow validation

```bash
npx @llmgateway/cli init --template qa-agent
```

### Weather Agent

A CLI agent demonstrating tool calling capabilities.

* **Stack:** TypeScript, AI SDK, OpenAI
* **Features:** Tool calling, real-time data, natural language
* **Use case:** Learning tool usage, building CLI agents

```bash
npx @llmgateway/cli init --template weather-agent
```

## Configuration

The CLI stores configuration in `~/.llmgateway/config.json`:

```json
{
	"apiKey": "llmgtwy_...",
	"defaultTemplate": "image-generation"
}
```

### Environment Variables

The `LLMGATEWAY_API_KEY` environment variable takes precedence over the config file:

```bash
export LLMGATEWAY_API_KEY="llmgtwy_..."
```

## More Resources

* [Agents](https://llmgateway.io/agents) — Pre-built AI agents
* [Templates](https://llmgateway.io/templates) — Production-ready starter projects
* [GitHub Repository](https://github.com/theopenco/llmgateway-templates) — Source code and issues

<Callout type="info">
  Need help or want to request a feature? Open an issue on
  [GitHub](https://github.com/theopenco/llmgateway-templates/issues).
</Callout>


# Cline Integration
URL: /guides/cline


import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";

[Cline](https://cline.bot) is an autonomous AI coding assistant that lives in your VS Code editor. It can create and edit files, run terminal commands, and help you build complex projects. You can configure Cline to use LLM Gateway for access to multiple AI providers with unified billing and cost tracking.

## Prerequisites

* VS Code based IDE installed
* An LLM Gateway API key

## Setup

Cline supports OpenAI-compatible API endpoints, making it straightforward to integrate with LLM Gateway.

<Steps>
  <Step>
    ### Install Cline Extension

    1. Open VS Code
    2. Go to the Extensions view (Cmd/Ctrl + Shift + X)
    3. Search for "Cline"
    4. Click **Install** on the Cline extension

        <img alt="Install Cline Extension" src={__img0} placeholder="blur" />
  </Step>

  <Step>
    ### Open Cline Settings

    1. Click on the Cline icon in the VS Code sidebar
    2. Click the settings gear icon in the Cline panel

        <img alt="Cline Settings" src={__img1} placeholder="blur" />
  </Step>

  <Step>
    ### Configure API Provider

    1. In the API Provider dropdown, select **OpenAI Compatible**
    2. Enter the following details:
       * **Base URL**: `https://api.llmgateway.io/v1`
       * **API Key**: Your LLM Gateway API key
       * **Model ID**: Choose a model (e.g., `claude-opus-4-5-20251101`, `gpt-5.2`, `gemini-3-pro-preview`, `deepseek-3.2`). See [provider-specific routing](/features/routing#provider-specific-routing) for more options.

        <img alt="Configure API Provider" src={__img2} placeholder="blur" />
  </Step>

  <Step>
    ### Test the Integration

    1. Open a project in VS Code
    2. Click on the Cline icon in the sidebar
    3. Type a message like "Create a hello world function in Python"
    4. Cline should respond and offer to create the file

        <img alt="Test Cline" src={__img3} placeholder="blur" />

    All requests will now be routed through LLM Gateway.
  </Step>
</Steps>

<Callout type="info">
  View all available models on the [models page](https://llmgateway.io/models).
</Callout>

## Features

Once configured, you can use all of Cline's features with LLM Gateway:

### Autonomous Coding

* Create new files and projects from scratch
* Edit existing code based on natural language instructions
* Refactor and improve code quality

### Terminal Commands

* Run build commands, tests, and scripts
* Install dependencies
* Execute any terminal operation

### File Management

* Create, read, and modify files
* Navigate your codebase
* Search for relevant code

## Model Selection Tips

### Using Provider-Specific Models

To use a specific provider's version of a model, prefix the model ID with the provider name. See [provider-specific routing](/features/routing#provider-specific-routing) for more options.

### Using Discounted Models

LLM Gateway offers discounted access to some models. Find them on the [models page](https://llmgateway.io/models?view=grid\&filters=1\&discounted=true) and copy the model ID.

### Using Free Models

Some models are available for free. Browse them on the [models page](https://llmgateway.io/models?view=grid\&filters=1\&free=true).

<Callout type="info">
  Need help? Join our [Discord community](https://llmgateway.io/discord) for
  support and troubleshooting assistance.
</Callout>

## Benefits of Using LLM Gateway with Cline

* **Multi-Provider Access**: Use models from OpenAI, Anthropic, Google, and more through a single API
* **Cost Control**: Track and limit your AI spending with detailed usage analytics
* **Unified Billing**: One account for all providers instead of managing multiple API keys
* **Caching**: Reduce costs with response caching for repeated requests
* **Analytics**: Monitor usage patterns and costs in the dashboard


# Codex CLI Integration
URL: /guides/codex-cli
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";

Codex CLI is OpenAI's open-source terminal coding agent. By default it connects to OpenAI's API, but with LLM Gateway you can route it through a single gateway—use GPT-5.3 Codex, Gemini, Claude, or any of 180+ models while keeping full cost visibility.

One config file. No code changes. Full cost tracking in your dashboard.

## Setup

<Steps>
  <Step>
    ### Sign Up for LLM Gateway

    [Sign up free](https://llmgateway.io/signup) — no credit card required. Copy your API key from the dashboard.
  </Step>

  <Step>
    ### Set Your API Key

    Set your LLM Gateway API key as the OpenAI key:

    ```bash
    export OPENAI_API_KEY=llmgtwy_your_api_key_here
    ```
  </Step>

  <Step>
    ### Create Config File

    Create or edit `~/.codex/config.toml`:

    ```bash
    openai_base_url = "https://api.llmgateway.io/v1"
    model = "auto"
    model_reasoning_effort = "high"

    [tui]
    show_tooltips = false

    [model_providers.openai]
    name = "OpenAI"
    base_url = "https://api.llmgateway.io/v1"
    ```
  </Step>

  <Step>
    ### Run Codex CLI

    ```bash
    codex
    ```

    All requests will now be routed through LLM Gateway.
  </Step>
</Steps>

## Why This Works

LLM Gateway's `/v1` endpoint is fully OpenAI-compatible. Codex CLI sends requests to our gateway instead of OpenAI directly, and we route them to the right provider behind the scenes. This means:

* **Use any model** — GPT-5.3 Codex, Gemini, Claude, or 180+ others
* **Keep your workflow** — Codex CLI doesn't know the difference
* **Track costs** — Every request appears in your LLM Gateway dashboard
* **Automatic caching** — Repeated requests hit cache, saving money

## Configuration Explained

### Base URL

The `openai_base_url` and `base_url` fields point Codex CLI to LLM Gateway instead of OpenAI:

```bash
openai_base_url = "https://api.llmgateway.io/v1"
```

### Model Selection

Use `auto` to let LLM Gateway pick the best model, or set a specific one from the [models page](https://llmgateway.io/models):

```bash
model = "auto"
# or pick a specific model
model = "gpt-5.3-codex"
```

### Reasoning Effort

Control how much reasoning the model uses. Options are `low`, `medium`, and `high`:

```bash
model_reasoning_effort = "high"
```

## Choosing Models

Use `auto` to let LLM Gateway pick the best model automatically, or choose a specific one from the [models page](https://llmgateway.io/models):

```bash
# let LLM Gateway pick the best model
model = "auto"

# or pick a specific model
model = "gpt-5.3-codex"
```

## What You Get

* **Any model in Codex CLI** — GPT-5.3 Codex for heavy lifting, lighter models for routine tasks
* **Cost visibility** — See exactly what each coding session costs
* **One bill** — Stop managing separate accounts for OpenAI, Anthropic, Google
* **Response caching** — Repeated requests hit cache automatically
* **Discounts** — Check [discounted models](https://llmgateway.io/models?discounted=true) for savings up to 90%

## Troubleshooting

### Authentication errors

Make sure your `OPENAI_API_KEY` environment variable is set to your LLM Gateway API key (starts with `llmgtwy_`).

### Model not found

Verify the model ID matches exactly what's listed on the [models page](https://llmgateway.io/models). Model IDs are case-sensitive.

### Connection issues

Check that `base_url` is set to `https://api.llmgateway.io/v1` (note the `/v1` at the end).

<Callout type="info">
  View all available models on the [models page](https://llmgateway.io/models).
</Callout>

<Callout type="info">
  Need help? Join our [Discord community](https://llmgateway.io/discord) for
  support and troubleshooting assistance.
</Callout>


# Cursor Integration
URL: /guides/cursor


import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";

Cursor is an AI-powered code editor built on VSCode. You can configure Cursor to use LLM Gateway for enhanced AI capabilities, access to multiple models, and better cost control.

<img alt="Cursor with LLM Gateway" src={__img0} placeholder="blur" />

## Prerequisites

* An LLM Gateway account with an API key
* Cursor IDE installed
* Basic understanding of Cursor's AI features

## Setup

Cursor supports OpenAI-compatible API endpoints, making it easy to integrate with LLM Gateway.

<Steps>
  <Step>
    ### Get Your API Key

    1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard)
    2. Navigate to **API Keys** section
    3. Create a new API key and copy the key

        <img alt="LLM Gateway API Keys" src={__img1} placeholder="blur" />
  </Step>

  <Step>
    ### Configure Cursor Settings

    1. Open Cursor and go to **Settings** then Click on "Cursor Settings"
    2. Click on "Models"
    3. Click on "Add OpenAI API Key"

        <img alt="Cursor Settings" src={__img2} placeholder="blur" />

    3. Scroll down to **OpenAI API Key** section
    4. Click on **Add OpenAI API Key**

        <img alt="Cursor API Key Input" src={__img3} placeholder="blur" />

    5. Enter your LLM Gateway API key

    6. In the same Models settings, find the **Override OpenAI Base URL** option

    7. Enable the override option

    8. Enter the LLM Gateway endpoint: `https://api.llmgateway.io/v1`
  </Step>

  <Step>
    ### Select Models

    1. In the **Models** section, you can now select from available models
    2. Choose any [LLM Gateway supported model](https://llmgateway.io/models):

        <img alt="Cursor Model Selection" src={__img4} placeholder="blur" />

    * For chat: Use models like `gpt-5`, `gpt-4o`, `claude-sonnet-4-5`
    * For custom models: Add the provider name before the model name (e.g. `custom/my-model`)
    * For discounted models: copy the ids from from the [models page](https://llmgateway.io/models?view=grid\&filters=1\&discounted=true)
    * For free models: copy the ids from from the [models page](https://llmgateway.io/models?view=grid\&filters=1\&free=true)
    * For reasoning models: copy the ids from from the [models page](https://llmgateway.io/models?view=grid\&filters=1\&reasoning=true)
  </Step>

  <Step>
    ### Test the Integration

    1. Open any code file in Cursor
    2. Try using the AI chat (Cmd/Ctrl + L)
    3. Or test the autocomplete feature while typing

        <img alt="Cursor AI Chat" src={__img5} placeholder="blur" />
        <img alt="Cursor AI Chat 2" src={__img6} placeholder="blur" />

    All AI requests will now be routed through LLM Gateway.
  </Step>
</Steps>

## Features

Once configured, you can use all of Cursor's AI features with LLM Gateway:

### AI Chat (Cmd/Ctrl + L)

* Ask questions about your code
* Request code explanations
* Get debugging help
* Generate new code

### Inline Edit (Cmd/Ctrl + K)

* Edit code with natural language instructions
* Refactor functions
* Add features to existing code

### Autocomplete

* Get intelligent code suggestions as you type
* Context-aware completions based on your codebase

## Advanced Configuration

### Using Different Models for Different Features

Cursor allows you to configure different models for different features:

1. **Chat Model**: Use a powerful model like `gpt-5` or `claude-sonnet-4-5`
2. **Autocomplete Model**: Use a faster, cost-effective model like `gpt-4o-mini`
3. **Custom Model**: Use a custom model like `custom/my-model`
4. **Reasoning Model**: Use a reasoning model like `canopywave/kimi-k2-thinking` [with 75% off discount](https://llmgateway.io/changelog/canopywave-kimi-k2-thinking-discount)

This gives you the best balance of performance and cost.

### Model Routing

With LLM Gateway's [routing features](/features/routing), you can:

* **Chooses cost-effective models** by default for optimal price-to-performance ratio
* **Automatically scales to more powerful models** based on your request's context size
* **Handles large contexts intelligently** by selecting models with appropriate context windows

## Troubleshooting

### Authentication Errors

If you see authentication errors:

* Verify your API key is correct
* Check that the base URL is set to `https://api.llmgateway.io/v1`
* Ensure your LLM Gateway account has sufficient credits

### Model Not Found

If you see "model not found" errors:

* Verify the model ID exists in the [models page](https://llmgateway.io/models)
* Check that you're using the correct model name format
* Some models may require specific provider configurations in your LLM Gateway dashboard

### Slow Responses

If responses are slow:

* Check your internet connection
* Monitor your usage in the LLM Gateway dashboard
* Consider using faster models for autocomplete features

<Callout type="info">
  Need help? Join our [Discord community](https://llmgateway.io/discord) for
  support and troubleshooting assistance.
</Callout>

## Benefits of Using LLM Gateway with Cursor

* **Multi-Provider Access**: Use models from OpenAI, Anthropic, Google, Open-source models and more
* **Cost Control**: Track and limit your AI spending with detailed usage analytics
* **Caching**: Reduce costs with response caching
* **Analytics**: Monitor usage patterns and costs


# Model Context Protocol (MCP)
URL: /guides/mcp
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";
import { Tabs, Tab } from "fumadocs-ui/components/tabs";

LLM Gateway provides a Model Context Protocol (MCP) server that enables AI assistants like Claude Code to access multiple LLM providers through a unified interface. This allows you to use any model from OpenAI, Anthropic, Google, and more directly from your AI coding assistant.

## What is MCP?

The Model Context Protocol (MCP) is an open standard that allows AI assistants to connect with external tools and data sources. LLM Gateway's MCP server exposes tools for:

* **Chat completions** - Send messages to any supported LLM
* **Image generation** - Generate images using models like Qwen Image
* **Nano Banana image generation** - Generate images with Gemini 3 Pro Image Preview and optionally save to disk
* **Model discovery** - List available models with capabilities and pricing

## Available Tools

### `chat`

Send a message to any LLM and get a response.

**Parameters:**

* `model` (string) - The model to use (e.g., `"gpt-4o"`, `"claude-sonnet-4-20250514"`)
* `messages` (array) - Array of messages with `role` and `content`
* `temperature` (number, optional) - Sampling temperature (0-2)
* `max_tokens` (number, optional) - Maximum tokens to generate

**Example:**

```json
{
	"model": "gpt-4o",
	"messages": [{ "role": "user", "content": "Explain quantum computing" }],
	"temperature": 0.7
}
```

### `generate-image`

Generate images from text prompts using AI image models.

**Parameters:**

* `prompt` (string) - Text description of the image to generate
* `model` (string, optional) - Image model (default: `"qwen-image-plus"`)
* `size` (string, optional) - Image size (default: `"1024x1024"`)
* `n` (number, optional) - Number of images (1-4, default: 1)

**Example:**

```json
{
	"prompt": "A serene mountain landscape at sunset",
	"model": "qwen-image-max",
	"size": "1024x1024"
}
```

### `generate-nano-banana`

Generate an image using Gemini 3 Pro Image Preview ("Nano Banana"). Returns an inline image preview, and optionally saves the image to disk when the server is configured with an upload directory.

**Parameters:**

* `prompt` (string) - Text description of the image to generate
* `filename` (string, optional) - Filename for the saved image, no path separators allowed (default: `nano-banana-{timestamp}.png`)
* `aspect_ratio` (string, optional) - Aspect ratio: `"1:1"`, `"16:9"`, `"4:3"`, or `"5:4"`

**Example:**

```json
{
	"prompt": "A pixel-art cat sitting on a rainbow",
	"filename": "hero-image.png",
	"aspect_ratio": "16:9"
}
```

<Callout type="info">
  **Saving images to disk** requires the `UPLOAD_DIR` environment variable to be
  set on the MCP server. When set, images are saved to that directory. Without
  it, images are returned inline only — no files are written to disk. See
  [Enabling local image saving](#enabling-local-image-saving) for setup
  instructions.
</Callout>

### `list-models`

List available LLM models with capabilities and pricing.

**Parameters:**

* `include_deactivated` (boolean, optional) - Include deactivated models
* `exclude_deprecated` (boolean, optional) - Exclude deprecated models
* `limit` (number, optional) - Maximum models to return (default: 20)
* `family` (string, optional) - Filter by family (e.g., `"openai"`, `"anthropic"`)

### `list-image-models`

List all available image generation models.

**Example output:**

```
# Image Generation Models

## Qwen Image Plus
- **Model ID:** `qwen-image-plus`
- **Description:** Text-to-image with excellent text rendering
- **Price:** $0.03 per request

## Qwen Image Max
- **Model ID:** `qwen-image-max`
- **Description:** Highest quality text-to-image
- **Price:** $0.075 per request
```

## Setup

<Tabs items={["Claude Code", "Codex", "Cursor", "Other MCP Clients"]}>
  <Tab value="Claude Code">
    <Steps>
      <Step>
        ### Get Your API Key

        1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard)
        2. Navigate to **API Keys** section
        3. Create a new API key and copy it
      </Step>

      <Step>
        ### Configure Claude Code

        Run the following command in your terminal:

        ```bash
        claude mcp add --transport http --scope user llmgateway https://api.llmgateway.io/mcp \
          --header "Authorization: Bearer your-api-key-here"
        ```

        <Callout type="info">
          **Alternative: Manual configuration**

          You can also add the MCP server manually by editing `~/.claude.json` (user scope) or `.mcp.json` in your project root (project scope):

          ```json
          {
            "mcpServers": {
              "llmgateway": {
                "url": "https://api.llmgateway.io/mcp",
                "headers": {
                  "Authorization": "Bearer your-api-key-here"
                }
              }
            }
          }
          ```

          Restart Claude Code after manual configuration changes.
        </Callout>
      </Step>

      <Step>
        ### Test the Integration

        Try using the tools in Claude Code:

        * "Use the chat tool to ask GPT-4o about TypeScript best practices"
        * "Generate an image of a futuristic city using the generate-image tool"
        * "Use generate-nano-banana to create a hero image for my landing page"
        * "List all available models from Anthropic"
      </Step>
    </Steps>
  </Tab>

  <Tab value="Codex">
    <Steps>
      <Step>
        ### Get Your API Key

        1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard)
        2. Navigate to **API Keys** section
        3. Create a new API key and copy it
        4. Set it as an environment variable: `export LLM_GATEWAY_API_KEY="your-api-key-here"`
      </Step>

      <Step>
        ### Configure Codex

        Run the following command in your terminal:

        ```bash
        codex mcp add llmgateway --url https://api.llmgateway.io/mcp \
          --bearer-token-env-var LLM_GATEWAY_API_KEY
        ```

        <Callout type="info">
          **Alternative: Manual configuration**

          You can also add the MCP server manually by editing `~/.codex/config.toml`:

          ```toml
          [mcp_servers.llmgateway]
          url = "https://api.llmgateway.io/mcp"
          bearer_token_env_var = "LLM_GATEWAY_API_KEY"
          ```
        </Callout>
      </Step>

      <Step>
        ### Test the Integration

        Run `/mcp` in the Codex TUI to confirm the `llmgateway` server is connected. Try:

        * "Use the chat tool to ask GPT-4o about TypeScript best practices"
        * "Generate an image of a futuristic city using the generate-image tool"
        * "Use generate-nano-banana to create a hero image for my landing page"
        * "List all available models from Anthropic"
      </Step>
    </Steps>
  </Tab>

  <Tab value="Cursor">
    <Steps>
      <Step>
        ### Get Your API Key

        1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard)
        2. Navigate to **API Keys** section
        3. Create a new API key and copy it
      </Step>

      <Step>
        ### Configure Cursor

        Add the following to your Cursor MCP configuration file (`~/.cursor/mcp.json`):

        ```json
        {
          "mcpServers": {
            "llmgateway": {
              "url": "https://api.llmgateway.io/mcp",
              "headers": {
                "Authorization": "Bearer your-api-key-here"
              }
            }
          }
        }
        ```

        Or open the Command Palette (`Cmd/Ctrl + Shift + P`), search for **"Cursor Settings"**, then go to **Tools & Integrations** > **Add Custom MCP** and paste the configuration above.

        <Callout type="info">
          Cursor v0.48.0+ is required for Streamable HTTP MCP support.
        </Callout>
      </Step>

      <Step>
        ### Test the Integration

        Open a chat in **Agent Mode**, click the **Select Tools** icon, and verify the LLM Gateway tools appear. Try:

        * "Use the chat tool to ask GPT-4o about TypeScript best practices"
        * "Generate an image of a futuristic city using the generate-image tool"
        * "Use generate-nano-banana to create a hero image for my landing page"
        * "List all available models from Anthropic"
      </Step>
    </Steps>
  </Tab>

  <Tab value="Other MCP Clients">
    LLM Gateway's MCP server supports the standard HTTP Streamable transport. Configure your client with:

    * **Endpoint:** `https://api.llmgateway.io/mcp`
    * **Authentication:** Bearer token via `Authorization` header or `x-api-key` header
    * **Protocol Version:** 2024-11-05

    **Direct HTTP Example:**

    ```bash
    curl -X POST https://api.llmgateway.io/mcp \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer your-api-key" \
      -d '{
        "jsonrpc": "2.0",
        "id": 1,
        "method": "tools/list"
      }'
    ```

    **Server-Sent Events (SSE):**

    For real-time updates, connect with `Accept: text/event-stream`:

    ```bash
    curl -N https://api.llmgateway.io/mcp \
      -H "Accept: text/event-stream" \
      -H "Authorization: Bearer your-api-key"
    ```
  </Tab>
</Tabs>

## Use Cases

### Multi-Model Access in Claude Code

Use Claude Code to interact with models it doesn't natively support:

```
Use the chat tool with model "gpt-4o" to analyze this code for security issues.
```

### Image Generation

Generate images directly from your AI assistant:

```
Use generate-image to create a logo for my new startup.
It should be minimalist, blue and white, representing AI and cloud computing.
```

### Nano Banana (Gemini Image Generation)

Generate images with Gemini 3 Pro for use in your project:

```
Use generate-nano-banana to create a hero image for my landing page with a 16:9 aspect ratio.
```

### Cost-Effective Model Selection

Query available models to find the best option for your task:

```
List models from OpenAI and Anthropic, then use the cheapest one for this simple task.
```

## Authentication

The MCP server supports two authentication methods:

1. **Bearer Token** - `Authorization: Bearer your-api-key`
2. **API Key Header** - `x-api-key: your-api-key`

Your API key is the same one you use for the REST API and works across all LLM Gateway services.

## OAuth Support

For applications that prefer OAuth authentication, LLM Gateway's MCP server implements OAuth 2.0:

* **Authorization Endpoint:** `/oauth/authorize`
* **Token Endpoint:** `/oauth/token`
* **Registration Endpoint:** `/oauth/register`
* **Supported Flows:** Authorization Code, Client Credentials

## Enabling Local Image Saving

By default, `generate-nano-banana` returns images inline without writing to disk. To enable saving generated images to the server filesystem, the `UPLOAD_DIR` environment variable must be set on the **gateway host** at startup. This is a server-side setting — it cannot be configured from the client.

This is only possible for **self-hosted** MCP deployments. Configure `UPLOAD_DIR` using your deployment method:

* **Docker:** Pass `-e UPLOAD_DIR=/data/images` or add it to your `docker-compose.yml` environment section.
* **systemd:** Add `Environment=UPLOAD_DIR=/data/images` to your service unit file.
* **.env file:** Add `UPLOAD_DIR=/data/images` to the `.env` file loaded by your gateway process.

<Callout type="warn">
  The shared hosted endpoint (`api.llmgateway.io`) does not support configuring
  `UPLOAD_DIR`. On the hosted service, images are always returned inline — no
  files are written to disk. To enable server-side image saving, you must
  self-host the MCP server and set `UPLOAD_DIR` at startup.
</Callout>

## Troubleshooting

### Connection Errors

If you're having trouble connecting:

1. Verify your API key is valid
2. Check the endpoint URL is correct: `https://api.llmgateway.io/mcp`
3. Ensure your firewall allows outbound HTTPS connections

### Tool Not Found

If tools aren't appearing:

1. Restart your MCP client
2. Check the configuration syntax
3. Verify the MCP server is responding: `GET https://api.llmgateway.io/mcp`

### Rate Limiting

The MCP server respects your account's rate limits. If you're hitting limits:

1. Check your usage in the dashboard
2. Consider upgrading your plan
3. Implement request queuing in your application

<Callout type="info">
  Need help? Join our [Discord community](https://llmgateway.io/discord) for
  support.
</Callout>

## Benefits

* **Unified Access** - Use 200+ models from 20+ providers through one interface
* **Cost Tracking** - Monitor usage and costs in the LLM Gateway dashboard
* **Caching** - Automatic response caching reduces costs and latency
* **Fallback** - Automatic provider failover ensures reliability
* **Image Generation** - Generate images directly from your AI assistant


# N8n Integration
URL: /guides/n8n


import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";

n8n is a powerful workflow automation tool that can be enhanced with AI capabilities through LLM Gateway. This guide shows how to integrate LLM Gateway into your n8n workflows.

<img alt="n8n workflow with LLM Gateway" src={__img0} placeholder="blur" />

## Prerequisites

* An LLM Gateway account with an API key
* n8n instance (self-hosted or cloud)
* Basic understanding of n8n workflows

## Setup

The easiest way to use LLM Gateway with n8n is through the OpenAI node with custom configuration.

<Steps>
  <Step>
    ### Add OpenAI Credentials

    1. In n8n, go to **Settings** → **Credentials**

        <img alt="n8n credentials" src={__img1} placeholder="blur" />

    2. Click **Add Credential** → **OpenAI**

        <img alt="n8n credentials" src={__img2} placeholder="blur" />

    3. Configure as follows:
       * **API Key**: Your LLM Gateway API key
       * **Base URL**: `https://api.llmgateway.io/v1`
       * **Organization ID**: Leave blank

        <img alt="n8n credentials" src={__img3} placeholder="blur" />
  </Step>

  <Step>
    ### Configure OpenAI Node

    1. Add an **AI Agent** node to your workflow
    2. Add a **Chat Model** edge to the node

        <img alt="n8n credentials" src={__img4} placeholder="blur" />

    3. Configure the node to use the LLMGateway provider

        <img alt="n8n credentials" src={__img5} placeholder="blur" />

    <Callout type="warning">
      Note: You have to toggle off the responses API. LLMGateway does not support
      it.
    </Callout>

        <img alt="responses api" src={__img6} placeholder="blur" />

    4. Select your desired options

    * **Model**: Use any [LLMGateway model](https://llmgateway.io/models) ID (e.g., `gpt-5`)
    * **Options**: Optionally, configure LLM parameters

        <img alt="n8n credentials" src={__img7} placeholder="blur" />
  </Step>

  <Step>
    ### Test Workflow

    Finally, try running your workflow with a test prompt.

        <img alt="n8n credentials" src={__img8} placeholder="blur" />
  </Step>
</Steps>


# OpenClaw Integration
URL: /guides/openclaw
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";

[OpenClaw](https://docs.openclaw.ai/) is a self-hosted gateway that connects your favorite chat apps—WhatsApp, Telegram, Discord, iMessage, and more—to AI coding agents. With LLM Gateway as a custom provider, you can route all your OpenClaw traffic through a single API, use any of 180+ models, and keep full visibility into usage and costs.

## Setup

<Steps>
  <Step>
    ### Sign Up for LLM Gateway

    [Sign up free](https://llmgateway.io/signup) — no credit card required. Copy your API key from the dashboard.
  </Step>

  <Step>
    ### Set Your API Key

    ```bash
    export LLMGATEWAY_API_KEY=llmgtwy_your_api_key_here
    ```
  </Step>

  <Step>
    ### Configure OpenClaw

    Add LLM Gateway as a custom provider in your `~/.openclaw/openclaw.json`:

    ```json
    {
    	"models": {
    		"mode": "merge",
    		"providers": {
    			"llmgateway": {
    				"baseUrl": "https://api.llmgateway.io/v1",
    				"apiKey": "${LLMGATEWAY_API_KEY}",
    				"api": "openai-completions",
    				"models": [
    					{
    						"id": "gpt-5.4",
    						"name": "GPT-5.4",
    						"contextWindow": 128000,
    						"maxTokens": 32000
    					},
    					{
    						"id": "claude-opus-4-6",
    						"name": "Claude Opus 4.6",
    						"contextWindow": 200000,
    						"maxTokens": 8192
    					},
    					{
    						"id": "gemini-3-1-pro-preview",
    						"name": "Gemini 3.1 Pro",
    						"contextWindow": 1000000,
    						"maxTokens": 8192
    					}
    				]
    			}
    		}
    	},
    	"agents": {
    		"defaults": {
    			"model": {
    				"primary": "llmgateway/gpt-5.4"
    			}
    		}
    	}
    }
    ```
  </Step>

  <Step>
    ### Start Chatting

    Launch OpenClaw and start chatting across your connected channels. All requests will be routed through LLM Gateway.
  </Step>
</Steps>

## Why Use LLM Gateway with OpenClaw

* **Model flexibility** — Switch between GPT-5.4, Claude Opus, Gemini, or any of 180+ models
* **Cost tracking** — Monitor exactly how much your chat agents cost to run
* **Single bill** — No need to manage multiple API provider accounts
* **Response caching** — Repeated queries hit cache, reducing costs
* **Rate limit handling** — Automatic fallback between providers

## Switching Models

Change the primary model in your config to switch between any model:

```json
{
	"agents": {
		"defaults": {
			"model": { "primary": "llmgateway/claude-opus-4-6" }
		}
	}
}
```

## Model Fallback Chain

OpenClaw supports fallback models. If the primary model is unavailable, it automatically falls back:

```json
{
	"agents": {
		"defaults": {
			"model": {
				"primary": "llmgateway/gpt-5.4",
				"fallbacks": ["llmgateway/claude-opus-4-6"]
			}
		}
	}
}
```

## Available Models

LLM Gateway uses root model IDs with smart routing—automatically selecting the best provider based on uptime, throughput, price, and latency. You can use any model from the [models page](https://llmgateway.io/models). Flagship models include:

| Model                    | Best For                                    |
| ------------------------ | ------------------------------------------- |
| `gpt-5.4`                | Latest OpenAI flagship, highest quality     |
| `claude-opus-4-6`        | Anthropic's most capable model              |
| `claude-sonnet-4-6`      | Fast reasoning with extended thinking       |
| `gemini-3-1-pro-preview` | Google's latest flagship, 1M context window |
| `o3`                     | Advanced reasoning tasks                    |
| `gpt-5.4-pro`            | Premium tier with extended reasoning        |
| `gemini-2.5-flash`       | Fast responses, good for high-volume        |
| `claude-haiku-4-5`       | Cost-effective, quick responses             |
| `grok-3`                 | xAI flagship                                |
| `deepseek-v3.1`          | Open-source with vision and tools           |

For more details on routing behavior, see [routing](/features/routing).

<Callout type="info">
  View all available models on the [models page](https://llmgateway.io/models).
</Callout>

## Tips for Chat Agents

### Optimize Costs

1. **Use smaller models for simple tasks** — Claude Haiku or Gemini Flash handle basic Q\&A well
2. **Enable caching** — LLM Gateway caches identical requests automatically
3. **Set token limits** — Configure max tokens to prevent runaway costs

### Improve Response Quality

1. **Choose the right model** — Claude Opus excels at nuanced conversation, GPT-5.4 at general tasks
2. **Use system prompts** — Configure your agent's personality and capabilities
3. **Test multiple models** — LLM Gateway makes it easy to A/B test different providers

<Callout type="info">
  Need help? Join our [Discord community](https://llmgateway.io/discord) for
  support and troubleshooting assistance.
</Callout>


# OpenCode Integration
URL: /guides/opencode
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";

[OpenCode](https://opencode.ai) is an open-source AI coding agent for your terminal, IDE, or desktop. This guide shows you how to connect it to LLM Gateway—giving you access to 180+ models from 60+ providers, all tracked in one dashboard.

## Prerequisites

* OpenCode installed — visit the [OpenCode download page](https://opencode.ai/download) for your platform
* An LLM Gateway API key

## Setup

<Steps>
  <Step>
    ### Create Configuration File

    Create `config.json` in your OpenCode configuration directory:

    **macOS/Linux:** `~/.config/opencode/config.json`

    **Windows:** `C:\Users\YourUsername\.config\opencode\config.json`

    ```json
    {
    	"provider": {
    		"llmgateway": {
    			"npm": "@ai-sdk/openai-compatible",
    			"name": "LLM Gateway",
    			"options": {
    				"baseURL": "https://api.llmgateway.io/v1"
    			},
    			"models": {
    				"gpt-5": {
    					"name": "GPT-5"
    				},
    				"gpt-5-mini": {
    					"name": "GPT-5 Mini"
    				},
    				"gemini-2.5-pro": {
    					"name": "Gemini 2.5 Pro"
    				},
    				"claude-3-5-sonnet-20241022": {
    					"name": "Claude 3.5 Sonnet"
    				}
    			}
    		}
    	},
    	"model": "llmgateway/gpt-5"
    }
    ```
  </Step>

  <Step>
    ### Launch OpenCode and Connect Provider

    Start OpenCode from your terminal:

    ```bash
    opencode
    ```

    **In VS Code/Cursor:**

    1. Install the OpenCode extension from the marketplace
    2. Open Command Palette (Ctrl+Shift+P or Cmd+Shift+P)
    3. Type "OpenCode" and select "Open opencode"

    Once OpenCode launches, run the `/connect` command to connect to LLM Gateway.
  </Step>

  <Step>
    ### Select LLM Gateway Provider

    In the provider list, scroll down to find "LLM Gateway" under the "Other" section and select it.
  </Step>

  <Step>
    ### Enter Your API Key

    OpenCode will prompt you for your API key. Enter your LLM Gateway API key and press Enter. OpenCode will automatically save your credentials securely.

    [Sign up for LLM Gateway](https://llmgateway.io/signup) and create an API key from your dashboard.
  </Step>

  <Step>
    ### Start Using OpenCode

    You're all set! OpenCode is now connected to LLM Gateway. You can start asking questions and building with AI.
  </Step>
</Steps>

## Why Use LLM Gateway with OpenCode

* **180+ models** — GPT-5, Claude, Gemini, Llama, and more from 60+ providers
* **One API key** — Stop juggling credentials for every provider
* **Cost tracking** — See what each coding session costs in your dashboard
* **Response caching** — Repeated requests hit cache automatically
* **Volume discounts** — The more you use, the more you save

## Adding More Models

You can add any model from the [models page](https://llmgateway.io/models) to your configuration. Simply add more entries to the `models` object in your `config.json`:

```json
{
	"provider": {
		"llmgateway": {
			"models": {
				"gpt-5": { "name": "GPT-5" },
				"gpt-5-mini": { "name": "GPT-5 Mini" },
				"deepseek/deepseek-chat": { "name": "DeepSeek Chat" },
				"meta/llama-3.3-70b": { "name": "Llama 3.3 70B" }
			}
		}
	}
}
```

After updating `config.json`, restart OpenCode to see the new models.

## Switching Models

To change your default model, update the `model` field in your configuration:

```json
{
	"model": "llmgateway/gpt-5-mini"
}
```

Or select a different model directly in the OpenCode interface.

<Callout type="info">
  View all available models on the [models page](https://llmgateway.io/models).
</Callout>

## Troubleshooting

### OpenCode asks for API key every time

Make sure the provider ID in your `config.json` matches exactly: `"llmgateway"` (all lowercase, no spaces).

### 404 Not Found errors

Verify your `baseURL` is set to `https://api.llmgateway.io/v1` (note the `/v1` at the end).

### Models not showing up

After editing `config.json`, restart OpenCode completely for changes to take effect.

### Connection timeout

Check that you have an active internet connection and that your API key is valid from the [dashboard](https://llmgateway.io/dashboard).

## Configuration Tips

* **Global configuration**: Use `~/.config/opencode/config.json` to apply settings across all projects
* **Project-specific**: Place `opencode.json` in your project root to override global settings for that project
* **Model selection**: You can specify different models for different types of tasks using OpenCode's agent configuration

<Callout type="info">
  Need help? Join our [Discord community](https://llmgateway.io/discord) for
  support and troubleshooting assistance.
</Callout>


# Anthropic API Compatibility
URL: /features/anthropic-endpoint
import { Callout } from "fumadocs-ui/components/callout";

# Anthropic API Compatibility

LLMGateway provides a native Anthropic-compatible endpoint at `/v1/messages` that allows you to use any model in our catalog while maintaining the familiar Anthropic API format
This is especially useful for applications designed for Claude that you want to extend to use other models.

<Callout type="info">
  Enjoy a 50% discount on our Anthropic models for a limited time.
</Callout>

## Overview

The Anthropic endpoint transforms requests from Anthropic's message format to the OpenAI-compatible format used by LLMGateway, then transforms the responses back to Anthropic's format. This means you can:

* Use **any model** available in LLMGateway with Anthropic's API format
* Maintain existing code that uses Anthropic's SDK or API format
* Access models from OpenAI, Google, Cohere, and other providers through the Anthropic interface
* Leverage LLMGateway's routing, caching, and cost optimization features

## Basic Usage

## Configuration for Claude Code

This endpoint is perfect for configuring Claude Code to use any model available in LLMGateway:

```bash
export ANTHROPIC_BASE_URL=https://api.llmgateway.io
export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here
# optional: specify a model, otherwise it uses the default Claude model
export ANTHROPIC_MODEL=gpt-5  # or any model from our catalog

# now run claude!
claude
```

### Choosing Models

You can use any model from the [models page](https://llmgateway.io/models). Popular options for Claude Code include:

```bash
# Use OpenAI's latest model
export ANTHROPIC_MODEL=gpt-5

# Use a cost-effective alternative
export ANTHROPIC_MODEL=gpt-5-mini

# Use Google's Gemini
export ANTHROPIC_MODEL=gemini-2.5-pro

# Use Anthropic's actual Claude models
export ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
```

## Environment Variables

When configuring Claude Code or other Anthropic-compatible applications, you can use these environment variables:

### ANTHROPIC\_MODEL

Specifies the main model to use for primary requests.

* **Default**: `claude-sonnet-4-20250514`
* **Example**: `export ANTHROPIC_MODEL=gpt-5`

### ANTHROPIC\_SMALL\_FAST\_MODEL

Specifies a smaller, faster model used for background functionality and internal operations.

* **Default**: `claude-3-5-haiku-20241022`
* **Example**: `export ANTHROPIC_SMALL_FAST_MODEL=gpt-5-nano`

```bash
# Example configuration
export ANTHROPIC_BASE_URL=https://api.llmgateway.io
export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here
export ANTHROPIC_MODEL=gpt-5
export ANTHROPIC_SMALL_FAST_MODEL=gpt-5-nano
```

## Advanced Features

### Making a manual request

```bash
curl -X POST "https://api.llmgateway.io/v1/messages" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ],
    "max_tokens": 100
  }'
```

### Response Format

The endpoint returns responses in Anthropic's message format:

```json
{
	"id": "msg_abc123",
	"type": "message",
	"role": "assistant",
	"model": "gpt-5",
	"content": [
		{
			"type": "text",
			"text": "Hello! I'm doing well, thank you for asking. How can I help you today?"
		}
	],
	"stop_reason": "end_turn",
	"stop_sequence": null,
	"usage": {
		"input_tokens": 13,
		"output_tokens": 20
	}
}
```


# API Keys & IAM Rules
URL: /features/api-keys
import { Tabs, Tab } from "fumadocs-ui/components/tabs";
import { Callout } from "fumadocs-ui/components/callout";

# API Keys & IAM Rules

API keys are the primary method for authenticating with the LLM Gateway. This guide covers creating API keys, managing them, and configuring IAM rules for fine-grained access control.

## Overview

LLM Gateway provides comprehensive API key management with the following features:

* **Basic API Key Management**: Create, list, update, and delete API keys
* **Usage Limits**: Set spending limits on individual API keys
* **IAM Rules**: Fine-grained access control for models, providers, and pricing
* **Usage Tracking**: Monitor API key usage and costs
* **Status Management**: Enable/disable keys without deletion

## Creating API Keys

### Via Dashboard

At this time, API keys can only be created via the dashboard.

1. Navigate to your project in the LLM Gateway dashboard
2. Go to the **API Keys** section
3. Click **Create API Key**
4. Provide a description for your key
5. Optionally set a usage limit
6. Click **Create**

<Callout type="warning">
  API keys are shown in full only once during creation. Make sure to copy and
  store them securely.
</Callout>

## Using API Keys

Once you have an API key, use it in the `Authorization` header of your requests:

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer llmgtwy_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
```

## API Key Management

## Disabling/Enabling API Keys

You can disable an API key to stop it from being used, but the key is not deleted and can be re-enabled later.

## Usage Limits

Usage is tracked per API key which is shown on the API Keys page. Usage includes both costs from LLM Gateway credits and usage from your own provider keys when applicable, giving you complete visibility into total spending per key.

You can set a maximum usage limit for each API key. When the limit is reached, requests using that key will return an error.

## IAM Rules

IAM (Identity Access Management) rules provide fine-grained access control over what models, providers, and pricing tiers an API key can access.

### Rule Types

#### Model Access Rules

Control access to specific models:

* **Allow Models**: Only allow access to specific models
* **Deny Models**: Block access to specific models

#### Provider Access Rules

Control access to specific providers:

* **Allow Providers**: Only allow access to specific providers
* **Deny Providers**: Block access to specific providers

#### Pricing Rules

Control access based on model pricing:

* **Allow Pricing**: Set constraints on what pricing tiers are allowed
* **Deny Pricing**: Block specific pricing tiers
* **Free vs Paid**: Allow or deny access to free vs paid models

## Error Handling

When API keys encounter IAM rule violations, the API returns specific error messages:

```json
{
	"error": true,
	"status": 403,
	"message": "Access denied: Model gpt-4 is not in the allowed models list"
}
```

Common error scenarios:

* Model not allowed by IAM rules
* Provider blocked by IAM rules
* Pricing limits exceeded
* API key disabled or deleted
* Usage limit reached

## Migration from Legacy Keys

If you have existing API keys without IAM rules:

1. **Backward Compatibility**: Existing keys continue to work without restrictions
2. **Gradual Migration**: Add IAM rules incrementally
3. **Testing**: Test IAM rules in development before applying to production
4. **Monitoring**: Monitor for access denied errors after implementing rules

<Callout type="info">
  API keys without IAM rules have unrestricted access to all models and
  providers.
</Callout>


# Audit Logs
URL: /features/audit-logs
import { Callout } from "fumadocs-ui/components/callout";

# Audit Logs

Audit logs provide complete visibility into all actions within your organization. Track who did what, when, and to which resource.

<Callout type="info">
  Audit logs are available on the [**Enterprise
  plan**](https://llmgateway.io/enterprise) for organization owners and admins.
</Callout>

## What's Tracked

Every significant action is logged with detailed metadata:

| Field             | Description                                              |
| ----------------- | -------------------------------------------------------- |
| **Timestamp**     | When the action occurred                                 |
| **User**          | Who performed the action (name and email)                |
| **Action**        | What was done (e.g., `api_key.create`, `project.update`) |
| **Resource Type** | Category of the affected resource                        |
| **Resource ID**   | Unique identifier of the affected resource               |
| **Details**       | Additional context like resource names or changed fields |

## Tracked Actions

### Organization Management

* `organization.update` — Organization settings changed
* `organization.delete` — Organization deleted

### Project Management

* `project.create` — New project created
* `project.update` — Project settings changed
* `project.delete` — Project deleted

### Team Management

* `team_member.add` — New member invited
* `team_member.update` — Member role changed
* `team_member.remove` — Member removed

### API Key Management

* `api_key.create` — New API key created
* `api_key.update_status` — API key enabled/disabled
* `api_key.update_limit` — Usage limit changed
* `api_key.delete` — API key deleted
* `api_key.iam_rule.create` — IAM rule added
* `api_key.iam_rule.update` — IAM rule modified
* `api_key.iam_rule.delete` — IAM rule removed

### Provider Key Management

* `provider_key.create` — Provider key added
* `provider_key.update` — Provider key status changed
* `provider_key.delete` — Provider key removed

### Billing Events

* `subscription.create` — Subscription started
* `subscription.cancel` — Subscription cancelled
* `subscription.resume` — Subscription resumed
* `payment.credit_topup` — Credits purchased

## Filtering and Search

Filter logs by:

* **Action** — Specific action type
* **Resource Type** — Category of resource
* **User** — Who performed the action
* **Date Range** — Time period

## Data Retention

Audit logs are retained for **90 days** on the Enterprise plan.

## Access Control

Only organization **owners** and **admins** can view audit logs. This ensures sensitive activity data is only visible to authorized personnel.

## Get Started

Audit logs are an Enterprise feature. [Contact us](https://llmgateway.io/enterprise) to enable Enterprise for your organization.


# Caching
URL: /features/caching
import { Callout } from "fumadocs-ui/components/callout";

# Caching

LLM Gateway provides intelligent response caching that can significantly reduce your API costs and response latency. When caching is enabled, identical requests are served from cache instead of making redundant calls to LLM providers.

## How It Works

When you make an API request:

1. LLM Gateway generates a cache key based on the request parameters
2. If a matching cached response exists, it's returned immediately
3. If no cache exists, the request is forwarded to the provider
4. The response is cached for future identical requests

This means repeated identical requests are served instantly from cache without incurring additional provider costs.

## Cost Savings

Caching can dramatically reduce costs for applications with repetitive requests:

| Scenario                    | Without Caching | With Caching | Savings |
| --------------------------- | --------------- | ------------ | ------- |
| 1,000 identical requests    | $10.00          | $0.01        | 99.9%   |
| 50% duplicate rate          | $10.00          | $5.00        | 50%     |
| Retry after transient error | $0.02           | $0.01        | 50%     |

<Callout type="info">
  Cached responses are free from provider costs. You only pay for the initial
  request that populates the cache.
</Callout>

## Requirements

<Callout type="warning">
  Caching requires [Data Retention](/features/data-retention) to be enabled with
  "Retain All Data" level. This allows LLM Gateway to store and retrieve
  response payloads.
</Callout>

To use caching:

1. Enable **Data Retention** in your organization settings with "Retain All Data" level
2. Enable **Caching** in your project settings under Preferences
3. Configure the cache duration (TTL) as needed
4. Make requests as normal—caching is automatic

## Cache Key Generation

The cache key is generated from these request parameters:

* Model identifier
* Messages array (roles and content)
* Temperature
* Max tokens
* Top P
* Tools/functions
* Tool choice
* Response format
* System prompt
* Other model-specific parameters

<Callout type="info">
  Requests with different parameter values, even slight variations, will not
  share cache entries.
</Callout>

## Cache Behavior

### Cache Hits

When a cache hit occurs:

* Response is returned immediately (sub-millisecond latency)
* No provider API call is made
* No inference costs are incurred

### Cache Misses

When a cache miss occurs:

* Request is forwarded to the LLM provider
* Response is stored in cache
* Normal inference costs apply
* Future identical requests will hit the cache

## Streaming and Caching

Caching works with both streaming and non-streaming requests:

* **Non-streaming**: Full response is cached and returned
* **Streaming**: The complete response is reconstructed from cache and streamed back

## Cache TTL (Time-to-Live)

Cache duration is configurable per project in your project settings. You can set the cache TTL from 10 seconds up to 1 year (31,536,000 seconds).

The default cache duration is 60 seconds. Adjust this based on your use case—longer durations work well for static content, while shorter durations are better for frequently changing data.

## Identifying Cached Responses

Cached responses show zero or minimal token usage since no inference occurred:

```json
{
	"usage": {
		"prompt_tokens": 0,
		"completion_tokens": 0,
		"total_tokens": 0,
		"cost_usd_total": 0
	}
}
```

## Use Cases

### Development and Testing

During development, you often send the same prompts repeatedly:

```typescript
// This prompt will only incur costs once
const response = await client.chat.completions.create({
	model: "gpt-4o",
	messages: [{ role: "user", content: "Explain quantum computing" }],
});
```

### Chatbots with Common Questions

FAQ-style interactions often have repeated questions:

```typescript
// Common questions are served from cache
const faqs = [
	"What are your business hours?",
	"How do I reset my password?",
	"What is your return policy?",
];
```

### Batch Processing

Processing large datasets with potentially duplicate items:

```typescript
// Duplicate items in batch are served from cache
for (const item of items) {
	const response = await client.chat.completions.create({
		model: "gpt-4o",
		messages: [{ role: "user", content: `Classify: ${item}` }],
	});
}
```

## Best Practices

### Maximize Cache Hits

* Use consistent prompt formatting
* Normalize input data before sending
* Use deterministic parameters (temperature: 0)
* Avoid including timestamps or random values in prompts

### Appropriate Use Cases

Caching is most effective for:

* Static knowledge queries
* Classification tasks
* FAQ responses
* Development/testing
* Retry scenarios

### When to Avoid Caching

Caching may not be suitable for:

* Real-time data requirements
* Highly personalized responses
* Time-sensitive information
* Creative tasks requiring variety

## Storage Costs

Since caching requires data retention, storage costs apply:

* **Rate**: $0.01 per 1 million tokens
* **Applies to**: All tokens in cached requests and responses

See [Data Retention](/features/data-retention) for complete pricing details.

<Callout type="success">
  The cost savings from caching typically far outweigh the storage costs,
  especially for applications with high request duplication.
</Callout>


# Cost Breakdown
URL: /features/cost-breakdown
import { Callout } from "fumadocs-ui/components/callout";

# Cost Breakdown

LLM Gateway provides real-time cost information for each API request directly in the response's `usage` object. This allows you to track costs programmatically without needing to query the dashboard.

<Callout type="info">
  Cost breakdown is available for all users on both hosted and self-hosted
  deployments.
</Callout>

## Response Format

When cost breakdown is enabled, your API responses will include additional cost fields in the `usage` object:

```json
{
	"id": "chatcmpl-123",
	"object": "chat.completion",
	"created": 1234567890,
	"model": "openai/gpt-4o",
	"choices": [
		{
			"index": 0,
			"message": {
				"role": "assistant",
				"content": "Hello! How can I help you today?"
			},
			"finish_reason": "stop"
		}
	],
	"usage": {
		"prompt_tokens": 10,
		"completion_tokens": 15,
		"total_tokens": 25,
		"cost_usd_total": 0.000125,
		"cost_usd_input": 0.000025,
		"cost_usd_output": 0.0001,
		"cost_usd_cached_input": 0,
		"cost_usd_request": 0,
		"cost_usd_data_storage": 0.00000025
	}
}
```

## Cost Fields

| Field                   | Description                                                                        |
| ----------------------- | ---------------------------------------------------------------------------------- |
| `cost_usd_total`        | Total inference cost for the request in USD (excludes storage)                     |
| `cost_usd_input`        | Cost for input/prompt tokens in USD                                                |
| `cost_usd_output`       | Cost for output/completion tokens in USD                                           |
| `cost_usd_cached_input` | Cost for cached input tokens in USD (discounted rate)                              |
| `cost_usd_request`      | Per-request cost in USD (for models with request-based pricing)                    |
| `cost_usd_data_storage` | LLM Gateway storage cost in USD ($0.01 per 1M tokens, only when retention enabled) |

<Callout type="info">
  **Note:** `cost_usd_total` includes only provider/inference costs. Data
  storage costs (`cost_usd_data_storage`) are billed separately by LLM Gateway
  when data retention is enabled in organization policies.
</Callout>

## Streaming Responses

Cost information is also available in streaming responses. The cost fields are included in the final usage chunk sent before the `[DONE]` message:

```
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[...],"usage":{"prompt_tokens":10,"completion_tokens":15,"total_tokens":25,"cost_usd_total":0.000125,"cost_usd_input":0.000025,"cost_usd_output":0.0001}}

data: [DONE]
```

## Example: Tracking Costs in Code

Here's an example of how to track costs programmatically using the cost breakdown feature:

```typescript
import OpenAI from "openai";

const client = new OpenAI({
	apiKey: process.env.LLM_GATEWAY_API_KEY,
	baseURL: "https://api.llmgateway.io/v1",
});

async function trackCosts() {
	const response = await client.chat.completions.create({
		model: "gpt-4o",
		messages: [{ role: "user", content: "Hello!" }],
	});

	const usage = response.usage as any;

	if (usage.cost_usd_total !== undefined) {
		console.log(`Request cost: $${usage.cost_usd_total.toFixed(6)}`);
		console.log(`  Input: $${usage.cost_usd_input.toFixed(6)}`);
		console.log(`  Output: $${usage.cost_usd_output.toFixed(6)}`);

		if (usage.cost_usd_cached_input > 0) {
			console.log(`  Cached: $${usage.cost_usd_cached_input.toFixed(6)}`);
		}
	}

	return response;
}
```

## Use Cases

### Budget Monitoring

Track costs in real-time and implement budget limits in your application:

```typescript
let totalSpent = 0;
const BUDGET_LIMIT = 10.0; // $10 budget

async function makeRequest(messages: Message[]) {
	const response = await client.chat.completions.create({
		model: "gpt-4o",
		messages,
	});

	const cost = (response.usage as any).cost_usd_total || 0;
	totalSpent += cost;

	if (totalSpent > BUDGET_LIMIT) {
		throw new Error(`Budget exceeded: $${totalSpent.toFixed(2)}`);
	}

	return response;
}
```

### Per-User Cost Allocation

Track costs per user for billing or analytics:

```typescript
const userCosts: Map<string, number> = new Map();

async function makeRequestForUser(userId: string, messages: Message[]) {
	const response = await client.chat.completions.create({
		model: "gpt-4o",
		messages,
	});

	const cost = (response.usage as any).cost_usd_total || 0;
	const currentCost = userCosts.get(userId) || 0;
	userCosts.set(userId, currentCost + cost);

	return response;
}
```

### Cost Analytics

Aggregate costs by model, time period, or any other dimension:

```typescript
interface CostEntry {
	timestamp: Date;
	model: string;
	inputCost: number;
	outputCost: number;
	totalCost: number;
}

const costLog: CostEntry[] = [];

async function loggedRequest(model: string, messages: Message[]) {
	const response = await client.chat.completions.create({
		model,
		messages,
	});

	const usage = response.usage as any;

	costLog.push({
		timestamp: new Date(),
		model: response.model,
		inputCost: usage.cost_usd_input || 0,
		outputCost: usage.cost_usd_output || 0,
		totalCost: usage.cost_usd_total || 0,
	});

	return response;
}
```

## Data Storage Costs

When data retention is enabled in organization policies, LLM Gateway stores full request and response payloads for the configured retention period. This storage incurs a small additional cost:

* **Rate**: $0.01 per 1 million tokens
* **Applies to**: Input, cached, output, and reasoning tokens
* **When charged**: Only when retention level is set to "Retain All Data"
* **Billing mode**: In API keys mode, only storage costs are deducted from credits (inference costs are billed to your provider keys)

Storage costs are displayed separately from inference costs in the dashboard and usage breakdown to maintain transparency between provider costs and LLM Gateway platform costs.

<Callout type="success">
  Enable [auto top-up](/dashboard) in billing settings to prevent request
  failures when storage costs deplete your credits.
</Callout>

## Self-Hosted Deployments

If you're running a self-hosted LLM Gateway deployment, cost breakdown is always included in API responses regardless of plan. This allows you to track internal costs and allocate them across teams or projects.


# Custom Providers
URL: /features/custom-providers
import { Callout } from "fumadocs-ui/components/callout";

# Custom Providers

LLMGateway supports integrating custom OpenAI-compatible providers, allowing you to use any API that follows the OpenAI chat completions format. This feature is perfect for:

* Private or self-hosted LLM deployments
* Specialized AI providers not natively supported
* Internal AI services within your organization
* Testing against different model endpoints

<Callout type="info">
  Custom providers must be OpenAI-compatible, supporting the
  `/v1/chat/completions` endpoint format.
</Callout>

## Quick Setup

### 1. Add a Custom Provider Key

Navigate to your organization's provider settings and add a custom provider via the UI.
Provide a lowercase name, OpenAI-compatible base URL, and API token for the custom provider.

### 2. Make Requests

Once configured, make requests using the format `{customName}/{modelName}`:

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mycompany/custom-gpt-4",
    "messages": [
      {
        "role": "user",
        "content": "Hello from my custom provider!"
      }
    ]
  }'
```

## Configuration Requirements

### Custom Provider Name

* **Format**: Lowercase letters only (`a-z`)
* **Examples**: `mycompany`, `internal`, `testing`
* **Invalid**: `MyCompany`, `my-company`, `my_company`, `123test`

<Callout type="warn">
  The custom provider name must match the regex pattern `/^[a-z]+$/` exactly.
</Callout>

### Base URL

* Must be a valid HTTPS URL
* Should point to your provider's base endpoint
* LLMGateway will append `/v1/chat/completions` automatically
* **Example**: `https://api.example.com` → `https://api.example.com/v1/chat/completions`

### API Token

* Provider-specific authentication token
* Used in the `Authorization: Bearer {token}` header

<Callout type="info">
  Unlike built-in providers, custom provider models are not validated, giving
  you complete flexibility.
</Callout>

## Supported Features

Custom providers inherit full LLMGateway functionality.


# Data Retention
URL: /features/data-retention
import { Callout } from "fumadocs-ui/components/callout";

# Data Retention

LLM Gateway offers configurable data retention policies that allow you to store full request and response payloads. This enables powerful debugging capabilities, detailed analytics, and compliance with data governance requirements.

## Retention Levels

LLM Gateway supports two retention levels that can be configured per organization:

| Level               | Description                                                                                    | Storage Cost    |
| ------------------- | ---------------------------------------------------------------------------------------------- | --------------- |
| **Metadata Only**   | Stores request metadata (timestamps, model, tokens, costs) without full payloads. Default.     | Free            |
| **Retain All Data** | Stores complete request and response payloads including messages, tool calls, and attachments. | $0.01/1M tokens |

<Callout type="info">
  Metadata-only retention is enabled by default and provides usage analytics
  without additional storage costs.
</Callout>

## Storage Pricing

When full data retention is enabled, storage is billed at **$0.01 per 1 million tokens**. This rate applies to:

* Input tokens (prompt)
* Cached input tokens
* Output tokens (completion)
* Reasoning tokens

Storage costs are calculated per request and displayed in the `cost_usd_data_storage` field of the response. See [Cost Breakdown](/features/cost-breakdown) for details on tracking costs programmatically.

### Example Cost Calculation

For a request with:

* 1,000 input tokens
* 500 output tokens
* 1,500 total tokens

Storage cost = 1,500 / 1,000,000 × $0.01 = **$0.000015**

## Configuring Retention

Data retention is configured at the organization level in your dashboard settings:

1. Navigate to **Organization Settings** → **Policies**
2. Select your preferred **Data Retention Level**
3. Save changes

<Callout type="warning">
  Changing retention settings applies to new requests only. Existing stored data
  follows the retention period active when it was created.
</Callout>

## Retention Periods

Data is retained for 30 days for all users. Enterprise plans can have custom retention periods. After the retention period expires, data is automatically deleted.

## Accessing Stored Data

When data retention is enabled, you can access your stored requests through the dashboard:

* View request history with full payload inspection
* Filter by model and date range
* Inspect complete request and response payloads

## Use Cases

### Debugging

Full data retention enables you to:

* Inspect exact prompts sent to models
* Review complete responses including tool calls
* Trace conversation histories
* Identify issues in production

### Analytics

With stored payloads, you can:

* Analyze prompt patterns and effectiveness
* Track response quality over time
* Build custom dashboards and reports
* Measure model performance across use cases

### Compliance

Data retention helps meet compliance requirements by:

* Maintaining audit trails of AI interactions
* Enabling data governance policies
* Supporting incident investigation
* Providing records for regulatory requirements

## Billing Considerations

### Credit Usage

In **API keys mode** (using your own provider keys):

* Only storage costs are deducted from LLM Gateway credits
* Inference costs are billed directly to your provider

In **credits mode**:

* Both inference and storage costs are deducted from credits

### Monitoring Storage Costs

Storage costs appear in:

* The `cost_usd_data_storage` field in API responses
* Usage dashboard under "Storage" category
* Billing invoices as a separate line item

<Callout type="success">
  Enable [auto top-up](/dashboard) in billing settings to ensure uninterrupted
  service when storage costs accumulate.
</Callout>

## Self-Hosted Deployments

Self-hosted deployments have full control over data retention:

* Configure retention periods in environment variables
* Data is stored in your own PostgreSQL database
* No additional storage costs (you manage your own infrastructure)

## Privacy and Security

* All stored data is encrypted at rest
* Access is restricted to organization members with appropriate permissions
* Data is automatically deleted after the retention period
* You can request immediate deletion of specific records through support


# Guardrails
URL: /features/guardrails
import { Callout } from "fumadocs-ui/components/callout";

# Guardrails

Guardrails protect your organization by automatically detecting and blocking harmful content in LLM requests before they reach the model.

<Callout type="info">
  Guardrails are available on the [**Enterprise
  plan**](https://llmgateway.io/enterprise).
</Callout>

## Overview

Guardrails run on every API request, scanning message content for:

* Security threats (prompt injection, jailbreak attempts)
* Sensitive data (PII, secrets, credentials)
* Policy violations (blocked terms, restricted topics)

When a violation is detected, you control what happens: block the request, redact the content, or log a warning.

## System Rules

Built-in rules protect against common threats:

### Prompt Injection Detection

Detects attempts to override or manipulate system instructions. Common patterns include:

* "Ignore all previous instructions"
* "You are now a different AI"
* Hidden instructions in encoded text

### Jailbreak Detection

Identifies attempts to bypass safety measures:

* DAN (Do Anything Now) prompts
* Roleplay-based bypasses
* Instruction override attempts

### PII Detection

Identifies personal information:

* Email addresses
* Phone numbers
* Social Security Numbers
* Credit card numbers
* IP addresses

When the action is set to **redact**, PII is replaced with placeholders like `[EMAIL_REDACTED]`.

### Secrets Detection

Detects credentials and API keys:

* AWS access keys and secrets
* Generic API keys
* Passwords in common formats
* Private keys

### File Type Restrictions

Control which file types can be uploaded:

* Configure allowed MIME types
* Set maximum file size limits
* Block potentially dangerous file types

### Document Leakage Prevention

Detects attempts to extract confidential documents or internal data.

## Configurable Actions

For each rule, choose how to respond:

| Action     | Behavior                                            |
| ---------- | --------------------------------------------------- |
| **Block**  | Reject the request with a content policy error      |
| **Redact** | Remove or mask the sensitive content, then continue |
| **Warn**   | Log the violation but allow the request to proceed  |

## Custom Rules

Create organization-specific rules for your use case:

### Blocked Terms

Prevent specific words or phrases from being used:

* Match type: exact, contains, or regex
* Case-sensitive matching option
* Multiple terms per rule

### Custom Regex

Match patterns unique to your organization:

* Internal project codenames
* Customer identifiers
* Domain-specific sensitive data

### Topic Restrictions

Block content related to specific topics:

* Define restricted topics
* Keyword-based detection

## Security Events Dashboard

Monitor all guardrail violations with a dedicated dashboard:

* **Total violations** — Overall count and trends
* **By action** — Breakdown of blocked, redacted, and warned
* **By category** — Which rules are being triggered
* **Detailed logs** — Individual violations with timestamps and matched patterns

## How It Works

```
Request → Guardrails Check → Action Based on Rules → Forward to Model (if allowed)
                ↓
           Log Violation
```

1. **Request received** — API request comes in with messages
2. **Content scanned** — All text content is checked against enabled rules
3. **Violations detected** — Matches are identified and logged
4. **Action taken** — Based on rule configuration (block/redact/warn)
5. **Request proceeds** — If not blocked, the (potentially redacted) request continues

## Best Practices

1. **Start with warnings** — Enable rules in warn mode first to understand your traffic patterns
2. **Review violations** — Check the Security Events dashboard regularly
3. **Tune custom rules** — Adjust blocked terms and regex patterns based on false positives
4. **Layer defenses** — Use multiple rule types together for comprehensive protection

## Get Started

Guardrails are an Enterprise feature. [Contact us](https://llmgateway.io/enterprise) to enable Enterprise for your organization.


# Image Generation
URL: /features/image-generation
import { Callout } from "fumadocs-ui/components/callout";

# Image Generation

LLMGateway supports image generation through two APIs:

1. **`/v1/images/generations`** — OpenAI-compatible images endpoint (recommended for simple image generation)
2. **`/v1/images/edits`** — OpenAI-compatible image editing endpoint
3. **`/v1/chat/completions`** — Chat completions with image generation models (for conversational image generation and editing)

## Available Models

You can find all available image generation models on our [models page](https://llmgateway.io/models?filters=1\&imageGeneration=true).

## OpenAI Images API

The `/v1/images/generations` endpoint provides a drop-in replacement for OpenAI's image generation API. It works with any OpenAI-compatible client library.

### Parameters

| Parameter         | Type    | Default      | Description                                                                                                      |
| ----------------- | ------- | ------------ | ---------------------------------------------------------------------------------------------------------------- |
| `prompt`          | string  | required     | A text description of the desired image(s)                                                                       |
| `model`           | string  | `"auto"`     | The model to use. `auto` resolves to `gemini-3-pro-image-preview`                                                |
| `n`               | integer | `1`          | Number of images to generate (1-10)                                                                              |
| `size`            | string  | —            | Image dimensions. Supported sizes depend on the model/provider — see [Image Configuration](#image-configuration) |
| `quality`         | string  | —            | Image quality. Supported values depend on the model/provider — see [Image Configuration](#image-configuration)   |
| `response_format` | string  | `"b64_json"` | Only `b64_json` is supported                                                                                     |
| `style`           | string  | —            | Image style: `vivid` or `natural`                                                                                |

### curl

```bash
curl -X POST "https://api.llmgateway.io/v1/images/generations" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3-pro-image-preview",
    "prompt": "A cute cat wearing a tiny top hat",
    "n": 1,
    "size": "1024x1024"
  }'
```

### OpenAI SDK

Works with the standard OpenAI client library — just point the base URL to LLMGateway.

```ts
import OpenAI from "openai";
import { writeFileSync } from "fs";

const client = new OpenAI({
	baseURL: "https://api.llmgateway.io/v1",
	apiKey: process.env.LLM_GATEWAY_API_KEY,
});

const response = await client.images.generate({
	model: "gemini-3-pro-image-preview",
	prompt: "A futuristic city skyline at sunset with flying cars",
	n: 1,
	size: "1024x1024",
});

response.data.forEach((image, i) => {
	if (image.b64_json) {
		const buf = Buffer.from(image.b64_json, "base64");
		writeFileSync(`image-${i}.png`, buf);
	}
});
```

### Vercel AI SDK

Use the `@llmgateway/ai-sdk-provider` with `generateImage`.

```ts
import { createLLMGateway } from "@llmgateway/ai-sdk-provider";
import { generateImage } from "ai";
import { writeFileSync } from "fs";

const llmgateway = createLLMGateway({
	apiKey: process.env.LLM_GATEWAY_API_KEY,
});

const result = await generateImage({
	model: llmgateway.image("gemini-3-pro-image-preview"),
	prompt:
		"A cozy cabin in a snowy mountain landscape at night with aurora borealis",
	size: "1024x1024",
	n: 1,
});

result.images.forEach((image, i) => {
	const buf = Buffer.from(image.base64, "base64");
	writeFileSync(`image-${i}.png`, buf);
});
```

## OpenAI Images Edit API

The `/v1/images/edits` endpoint is OpenAI-compatible and supports a focused subset of `images.edit` parameters.

### Parameters

| Parameter            | Type                     | Required | Description                                                        |
| -------------------- | ------------------------ | -------- | ------------------------------------------------------------------ |
| `images`             | array of `{ image_url }` | yes      | Input images. `image_url` supports HTTPS URLs and base64 data URLs |
| `prompt`             | string                   | yes      | A text description of the desired image edit                       |
| `model`              | string                   | no       | Image editing model                                                |
| `background`         | enum                     | no       | `transparent`, `opaque`, or `auto`                                 |
| `input_fidelity`     | enum                     | no       | `high` or `low`                                                    |
| `n`                  | integer                  | no       | Number of edited images to generate                                |
| `output_format`      | enum                     | no       | `png`, `jpeg`, or `webp`                                           |
| `output_compression` | integer                  | no       | Compression level for `jpeg`/`webp`                                |
| `quality`            | enum                     | no       | `low`, `medium`, `high`, or `auto`                                 |
| `size`               | enum                     | no       | `auto`, `1024x1024`, `1536x1024`, `1024x1536`                      |

<Callout type="warning">
  `mask` is not supported yet on `/v1/images/edits`.
</Callout>

### curl (HTTPS image URL)

```bash
curl -X POST "https://api.llmgateway.io/v1/images/edits" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "images": [
      {
        "image_url": "https://example.com/source-image.png"
      }
    ],
    "prompt": "Add a watercolor effect to this image",
    "model": "gemini-3-pro-image-preview",
    "quality": "high",
    "size": "1024x1024"
  }'
```

### curl (base64 data URL)

```bash
curl -X POST "https://api.llmgateway.io/v1/images/edits" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "images": [
      {
        "image_url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."
      }
    ],
    "prompt": "Turn this into a pixel-art style image"
  }'
```

## Chat Completions API

Image generation also works through the `/v1/chat/completions` endpoint, which is useful for conversational image generation, image editing with vision, and multi-turn interactions.

### Making Requests

Simply use an image generation model and provide a text prompt describing the image you want to create.

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3-pro-image-preview",
    "messages": [
      {
        "role": "user",
        "content": "Generate an image of a cute golden retriever puppy playing in a sunny meadow"
      }
    ]
  }'
```

### Response Format

Image generation models return responses in the standard chat completions format, with generated images included in the `images` array within the assistant message:

```json
{
	"id": "chatcmpl-1756234109285",
	"object": "chat.completion",
	"created": 1756234109,
	"model": "gemini-3-pro-image-preview",
	"choices": [
		{
			"index": 0,
			"message": {
				"role": "assistant",
				"content": "Here's an image of a cute dog for you: ",
				"images": [
					{
						"type": "image_url",
						"image_url": {
							"url": "data:image/png;base64,<base64_encoded_image_data>"
						}
					}
				]
			},
			"finish_reason": "stop"
		}
	],
	"usage": {
		"prompt_tokens": 8,
		"completion_tokens": 1303,
		"total_tokens": 1311
	}
}
```

### Vision support

You can edit or modify images by combining image generation with [vision models](/features/vision) by including the image in the `messages` array.

### Response Structure

#### Images Array

The `images` array contains one or more generated images with the following structure:

* `type`: Always `"image_url"` for generated images
* `image_url.url`: A data URL containing the base64-encoded image data (format: `data:image/png;base64,<data>`)

#### Content Field

The `content` field may contain descriptive text about the generated image, depending on the model's behavior.

### AI SDK (Chat Completions)

You can use the AI SDK to generate images with your existing generateText or streamText calls using the LLMGateway provider.

#### Example

```ts title="/api/chat/route.ts"
import { streamText, type UIMessage, convertToModelMessages } from "ai";
import { createLLMGateway } from "@llmgateway/ai-sdk-provider";

interface ChatRequestBody {
	messages: UIMessage[];
}

export async function POST(req: Request) {
	const body = await req.json();

	const { messages }: ChatRequestBody = body;

	const llmgateway = createLLMGateway({
		apiKey: "llmgateway_api_key",
		baseUrl: "https://api.llmgateway.io/v1",
	});

	try {
		const result = streamText({
			model: llmgateway.chat("gemini-3-pro-image-preview"),
			messages: convertToModelMessages(messages),
		});

		return result.toUIMessageStreamResponse();
	} catch {
		return new Response(
			JSON.stringify({ error: "LLM Gateway Chat request failed" }),
			{
				status: 500,
			},
		);
	}
}
```

Then you can render the image in your frontend using the `Image` component from the [ai-elements](https://ai-sdk.dev/elements/components/image).

Here is a full example of how to use the AI SDK to generate images in your frontend:

```tsx title="/app/page.tsx"
"use client";

import { useState, useRef } from "react";
import { useChat } from "@ai-sdk/react";
import { parseImagePartToDataUrl } from "@/lib/image-utils";
import {
	PromptInput,
	PromptInputBody,
	PromptInputButton,
	PromptInputSubmit,
	PromptInputTextarea,
	PromptInputToolbar,
} from "@/components/ai-elements/prompt-input";
import {
	Conversation,
	ConversationContent,
} from "@/components/ai-elements/conversation";
import { Image } from "@/components/ai-elements/image";
import { Loader } from "@/components/ai-elements/loader";
import { Message, MessageContent } from "@/components/ai-elements/message";
import { Response } from "@/components/ai-elements/response";

export const ChatUI = () => {
	const textareaRef = useRef<HTMLTextAreaElement | null>(null);
	const [text, setText] = useState("");
	const { messages, status, stop, regenerate, sendMessage } = useChat();

	return (
		<>
			<div className="flex-1 overflow-y-auto px-4 pb-24">
				<Conversation>
					<ConversationContent>
						{messages.length === 0 ? (
							<div className="mb-6 text-center">
								<h2 className="text-3xl font-semibold tracking-tight">
									How can I help you?
								</h2>
							</div>
						) : (
							messages.map((m, messageIndex) => {
								const isLastMessage = messageIndex === messages.length - 1;

								if (m.role === "assistant") {
									const textContent = m.parts
										.filter((p) => p.type === "text")
										.map((p) => p.text)
										.join("");
									// Combine all image parts (both image_url and file types)
									const imageParts = m.parts.filter(
										(p) =>
											p.type === "file" && p.mediaType?.startsWith("image/"),
									);

									return (
										<div key={m.id}>
											{textContent ? <Response>{textContent}</Response> : null}
											{imageParts.length > 0 ? (
												<div className="mt-3 grid grid-cols-1 gap-3 sm:grid-cols-2">
													{imageParts.map((part, idx: number) => {
														const { base64Only, mediaType } =
															parseImagePartToDataUrl(part);

														if (!base64Only) {
															return null;
														}

														return (
															<Image
																key={idx}
																base64={base64Only}
																mediaType={mediaType}
																alt={part.name || "Generated image"}
															/>
														);
													})}
												</div>
											) : null}
											{isLastMessage &&
												(status === "submitted" || status === "streaming") && (
													<Loader />
												)}
										</div>
									);
								} else {
									return (
										<Message key={m.id} from={m.role}>
											<MessageContent variant="flat">
												{m.parts.map((p, i) => {
													if (p.type === "text") {
														return <div key={i}>{p.text}</div>;
													}
													return null;
												})}
											</MessageContent>
											{isLastMessage &&
												(status === "submitted" || status === "streaming") && (
													<Loader />
												)}
										</Message>
									);
								}
							})
						)}
					</ConversationContent>
				</Conversation>
			</div>
			<div className="sticky bottom-0 left-0 right-0 px-4 pb-[max(env(safe-area-inset-bottom),1rem)] pt-2 bg-gradient-to-t from-background via-background/95 to-transparent backdrop-blur supports-[backdrop-filter]:bg-background/60">
				<PromptInput
					aria-disabled={status === "streaming"}
					onSubmit={async (message) => {
						if (status === "streaming") {
							return;
						}

						try {
							const textContent = message.text ?? "";
							if (!textContent.trim()) {
								return;
							}

							setText(""); // Clear input immediately

							const parts = [{ type: "text", text: textContent }];

							// Call sendMessage which will handle adding the user message and API request
							sendMessage({
								role: "user",
								parts,
							});
						} catch (error) {
							// Throw error here
						}
					}}
				>
					<PromptInputBody>
						<PromptInputTextarea
							ref={textareaRef}
							value={text}
							onChange={(e) => setText(e.currentTarget.value)}
							placeholder="Message"
						/>
					</PromptInputBody>
					<PromptInputToolbar>
						<div className="flex items-center gap-2">
							{status === "streaming" ? (
								<PromptInputButton onClick={() => stop()} variant="ghost">
									Stop
								</PromptInputButton>
							) : null}
							<PromptInputSubmit
								status={status === "streaming" ? "streaming" : "ready"}
							/>
						</div>
					</PromptInputToolbar>
				</PromptInput>
			</div>
		</>
	);
};
```

```ts title="/lib/image-utils.ts"
/**
 * Parses a file object containing image data and returns a properly formatted data URL
 * and normalized media type.
 *
 * Handles:
 * - Normalizing mediaType from various property names (mediaType, mime_type)
 * - Detecting existing data: URLs
 * - Detecting base64-looking content
 * - Stripping whitespace from base64 content
 * - Building proper data:...;base64,... URLs
 */
export function parseImageFile(file: {
	url?: string;
	mediaType?: string;
	mime_type?: string;
}): { dataUrl: string; mediaType: string } {
	const mediaType = file.mediaType || file.mime_type || "image/png";
	let url = String(file.url || "");

	const isDataUrl = url.startsWith("data:");
	const looksLikeBase64 =
		!isDataUrl && /^[A-Za-z0-9+/=\s]+$/.test(url.slice(0, 200));

	if (looksLikeBase64) {
		url = url.replace(/\s+/g, "");
	}

	const dataUrl = isDataUrl
		? url
		: looksLikeBase64
			? `data:${mediaType};base64,${url}`
			: url;

	return { dataUrl, mediaType };
}

/**
 * Extracts base64-only content from a data URL.
 * Returns empty string if the input is not a valid data URL.
 */
export function extractBase64FromDataUrl(dataUrl: string): string {
	if (!dataUrl.startsWith("data:")) {
		return "";
	}

	const comma = dataUrl.indexOf(",");
	return comma >= 0 ? dataUrl.slice(comma + 1) : "";
}

/**
 * Parses an image part (either image_url or file type) and returns
 * dataUrl, base64Only, and mediaType ready for rendering.
 *
 * Handles error cases gracefully by returning empty base64Only string
 * when parsing fails, allowing the renderer to skip invalid images.
 */
export function parseImagePartToDataUrl(part: any): {
	dataUrl: string;
	base64Only: string;
	mediaType: string;
} {
	try {
		// Handle image_url parts
		if (part.type === "image_url" && part.image_url?.url) {
			const url = part.image_url.url;
			const mediaType = "image/png"; // Default for image_url parts

			if (url.startsWith("data:")) {
				// Extract media type from data URL if present
				const match = url.match(/data:([^;]+)/);
				const extractedMediaType = match?.[1] || mediaType;
				return {
					dataUrl: url,
					base64Only: extractBase64FromDataUrl(url),
					mediaType: extractedMediaType,
				};
			}

			return {
				dataUrl: url,
				base64Only: "",
				mediaType,
			};
		}

		// Handle file parts (AI SDK format)
		if (part.type === "file") {
			const { dataUrl, mediaType } = parseImageFile(part);
			return {
				dataUrl,
				base64Only: extractBase64FromDataUrl(dataUrl),
				mediaType,
			};
		}

		return {
			dataUrl: "",
			base64Only: "",
			mediaType: "image/png",
		};
	} catch {
		return {
			dataUrl: "",
			base64Only: "",
			mediaType: "image/png",
		};
	}
}
```

## Image Configuration

You can customize the generated image using the optional `image_config` parameter (for chat completions) or `size`/`quality`/`style` parameters (for the images API). The supported parameters vary by provider.

### Google Models

Available Google models:

| Model                            | Description                                                                         |
| -------------------------------- | ----------------------------------------------------------------------------------- |
| `gemini-3-pro-image-preview`     | Gemini 3 Pro with native image generation. Supports aspect ratios and 1K–4K sizes.  |
| `gemini-3.1-flash-image-preview` | Gemini 3.1 Flash with native image generation. Supports 0.5K–4K sizes (default 1K). |

#### gemini-3-pro-image-preview

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3-pro-image-preview",
    "messages": [
      {
        "role": "user",
        "content": "Generate an image of a mountain landscape at sunset"
      }
    ],
    "image_config": {
      "aspect_ratio": "16:9",
      "image_size": "4K"
    }
  }'
```

| Parameter      | Type   | Description                                                                                                                                   |
| -------------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------------- |
| `aspect_ratio` | string | The aspect ratio of the generated image. Options: `"1:1"`, `"2:3"`, `"3:2"`, `"3:4"`, `"4:3"`, `"4:5"`, `"5:4"`, `"9:16"`, `"16:9"`, `"21:9"` |
| `image_size`   | string | The resolution of the generated image. Options: `"1K"` (1024x1024), `"2K"` (2048x2048), `"4K"` (4096x4096)                                    |

#### gemini-3.1-flash-image-preview

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3.1-flash-image-preview",
    "messages": [
      {
        "role": "user",
        "content": "Generate an image of a mountain landscape at sunset"
      }
    ],
    "image_config": {
      "image_size": "1K"
    }
  }'
```

| Parameter      | Type   | Description                                                                                                                                                                       |
| -------------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `aspect_ratio` | string | The aspect ratio of the generated image. Options: `"1:1"`, `"1:4"`, `"1:8"`, `"2:3"`, `"3:2"`, `"3:4"`, `"4:1"`, `"4:3"`, `"4:5"`, `"5:4"`, `"8:1"`, `"9:16"`, `"16:9"`, `"21:9"` |
| `image_size`   | string | The resolution of the generated image. Options: `"0.5K"` (512x512), `"1K"` (1024x1024, default), `"2K"` (2048x2048), `"4K"` (4096x4096)                                           |

<Callout type="info">
  `gemini-3.1-flash-image-preview` uniquely supports `"0.5K"` resolution, which
  is not available on other Google image models.
</Callout>

### Alibaba Models

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "alibaba/qwen-image-plus",
    "messages": [
      {
        "role": "user",
        "content": "Generate an image of a mountain landscape at sunset"
      }
    ],
    "image_config": {
      "image_size": "1024x1536",
      "n": 1,
      "seed": 42
    }
  }'
```

| Parameter    | Type    | Description                                                                                      |
| ------------ | ------- | ------------------------------------------------------------------------------------------------ |
| `image_size` | string  | Image dimensions in `WIDTHxHEIGHT` format. Examples: `"1024x1024"`, `"1024x1536"`, `"1536x1024"` |
| `n`          | integer | Number of images to generate (1-4)                                                               |
| `seed`       | integer | Random seed for reproducible generation                                                          |

Available Alibaba models:

| Model                     | Price        | Description                       |
| ------------------------- | ------------ | --------------------------------- |
| `alibaba/qwen-image`      | $0.035/image | Standard quality image generation |
| `alibaba/qwen-image-plus` | $0.03/image  | Good balance of quality and cost  |
| `alibaba/qwen-image-max`  | $0.075/image | Highest quality image generation  |

<Callout type="info">
  Alibaba models use explicit pixel dimensions (e.g., `"1024x1536"`) instead of
  aspect ratios. For portrait orientation use `"1024x1536"`, for landscape use
  `"1536x1024"`.
</Callout>

### Z.AI Models

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai/cogview-4",
    "messages": [
      {
        "role": "user",
        "content": "Generate an image of a futuristic city skyline"
      }
    ],
    "image_config": {
      "image_size": "1024x1024"
    }
  }'
```

| Parameter    | Type    | Description                                                                                      |
| ------------ | ------- | ------------------------------------------------------------------------------------------------ |
| `image_size` | string  | Image dimensions in `WIDTHxHEIGHT` format. Examples: `"1024x1024"`, `"2048x1024"`, `"1024x2048"` |
| `n`          | integer | Number of images to generate                                                                     |

Available Z.AI models:

| Model           | Price        | Description                                                                                                         |
| --------------- | ------------ | ------------------------------------------------------------------------------------------------------------------- |
| `zai/cogview-4` | $0.01/image  | CogView-4 with bilingual support and excellent text rendering                                                       |
| `zai/glm-image` | $0.015/image | GLM-Image with hybrid auto-regressive architecture, excellent for text-rendering and knowledge-intensive generation |

<Callout type="info">
  CogView-4 supports both Chinese and English prompts and excels at generating
  images with embedded text.
</Callout>

### ByteDance Models

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bytedance/seedream-4-5",
    "messages": [
      {
        "role": "user",
        "content": "Generate an image of a futuristic cyberpunk city at night"
      }
    ],
    "image_config": {
      "image_size": "2048x2048"
    }
  }'
```

| Parameter    | Type   | Description                                                                                      |
| ------------ | ------ | ------------------------------------------------------------------------------------------------ |
| `image_size` | string | Image dimensions in `WIDTHxHEIGHT` format. Examples: `"1024x1024"`, `"2048x2048"`, `"4096x4096"` |

Available ByteDance models:

| Model                    | Price        | Description                                                     |
| ------------------------ | ------------ | --------------------------------------------------------------- |
| `bytedance/seedream-4-0` | $0.035/image | High-quality text-to-image generation with 2K default output    |
| `bytedance/seedream-4-5` | $0.045/image | Enhanced quality and consistency with improved prompt adherence |

<Callout type="info">
  Seedream models support up to 2-10 reference images for multi-image fusion and
  generation. The default output resolution is 2048×2048 (2K), with support up
  to 4096×4096 (4K).
</Callout>

## Usage Notes

<Callout type="info">
  Image generation models typically have higher token costs compared to
  text-only models due to the computational requirements of image synthesis.
</Callout>

<Callout type="warning">
  Generated images are returned as base64-encoded data URLs, which can be large.
  Consider the payload size when integrating image generation into your
  applications.
</Callout>


# Metadata
URL: /features/metadata
# Metadata

LLM Gateway supports sending additional metadata with your requests using custom headers. This allows you to include information like user sessions, application versions, tenant IDs, or other contextual data that can be useful for analytics and monitoring.

Later, you can filter by specific values to return, such as for a specific user or session. Additionally, in the future, you will be able to segment your analytics and monitoring based on this metadata. For example, you could show cost and latency breakdowns per user, application, country, feature, or any other dimension you want to track.

## Custom Headers

You can include custom headers with the `X-LLMGateway-` prefix to send metadata alongside your LLM requests:

```bash
curl -X POST https://api.llmgateway.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "X-LLMGateway-Country: US" \
  -H "X-LLMGateway-User-ID: 9403f741-a524-4b18-b1b2-dbb71cdff2a4" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Hello, how are you?"
      }
    ]
  }'
```

## Best Practices

### Header Naming

* Use the `X-LLMGateway-` prefix for all custom metadata
* Use descriptive, consistent naming conventions
* Avoid special characters; use hyphens to separate words

### Data Privacy

* Be mindful of sensitive data in headers
* Consider hashing or anonymizing user identifiers
* Follow your organization's data privacy policies

### Performance

* Keep header values reasonably short
* Avoid sending unnecessary metadata that won't be used for analytics
* Consider the impact on request size, especially for high-volume applications

## Example: Multi-tenant Application

For a multi-tenant application, you might use metadata headers like this:

```bash
curl -X POST https://api.llmgateway.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "X-LLMGateway-Tenant-ID: acme-corp" \
  -H "X-LLMGateway-User-ID: user-12345" \
  -H "X-LLMGateway-App-Version: 2.1.4" \
  -H "X-LLMGateway-Feature: chat-assistant" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Summarize this document..."
      }
    ]
  }'
```

This allows you to track usage and costs per tenant, user, application version, and feature, providing detailed insights into how your LLM integration is being used across your platform.


# Reasoning
URL: /features/reasoning
import { Callout } from "fumadocs-ui/components/callout";

# Reasoning

LLMGateway supports reasoning-capable models that can show their step-by-step thought process before providing a final answer. This feature is particularly useful for complex problem-solving tasks, mathematical calculations, and logical reasoning.

## Reasoning-Enabled Models

You can find all reasoning-enabled models on our [models page with reasoning filter](https://llmgateway.io/models?filters=1\&reasoning=true). These models include:

* OpenAI's GPT-5 series (e.g., `gpt-5`, `gpt-5-mini`)
  * Note: GPT-5 models use reasoning but currently do not return the reasoning content in the response.
* Anthropic's Claude 3.7 Sonnet
* Google's Gemini 2.0 Flash Thinking and Gemini 2.5 Pro
* GPT OSS models such as `gpt-oss-120b` and `gpt-oss-20b`
* Z.AI's reasoning models

<Callout type="info">
  Some models may reason internally even if the `reasoning_effort` parameter is
  not specified.
</Callout>

## Using the Reasoning Parameter

There are two ways to control reasoning effort:

### Option 1: Top-level `reasoning_effort`

Add the `reasoning_effort` parameter directly to your request:

* `minimal` - Fastest reasoning with minimal thought process (only for GPT-5 models)
* `low` - Light reasoning for simpler tasks
* `medium` - Balanced reasoning for most tasks
* `high` - Deep reasoning for complex problems
* `xhigh` - Maximum reasoning depth for the most complex problems

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-120b",
    "messages": [
      {
        "role": "user",
        "content": "What is 2/3 + 1/4 + 5/6?"
      }
    ],
    "reasoning_effort": "medium"
  }'
```

### Option 2: Using the `reasoning` object

Use the unified `reasoning` configuration object with an `effort` field:

* `none` - Disable reasoning
* `minimal` - Fastest reasoning with minimal thought process
* `low` - Light reasoning for simpler tasks
* `medium` - Balanced reasoning for most tasks
* `high` - Deep reasoning for complex problems
* `xhigh` - Maximum reasoning depth for the most complex problems

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5",
    "messages": [
      {
        "role": "user",
        "content": "What is 2/3 + 1/4 + 5/6?"
      }
    ],
    "reasoning": {
      "effort": "medium"
    }
  }'
```

<Callout type="warning">
  You cannot use both `reasoning_effort` and `reasoning.effort` in the same
  request. Choose one approach. However, you can combine `reasoning_effort` or
  `reasoning.effort` with `reasoning.max_tokens` — when `max_tokens` is
  specified, it takes priority over the effort level.
</Callout>

### Example Response

The response will include a `reasoning` field in the message object containing the model's step-by-step thought process:

```json
{
	"id": "chatcmpl-abc123",
	"object": "chat.completion",
	"created": 1234567890,
	"model": "gpt-oss-120b",
	"choices": [
		{
			"index": 0,
			"message": {
				"role": "assistant",
				"content": "The answer is 1.75 or 7/4.",
				"reasoning": "First, I need to find a common denominator for 2/3, 1/4, and 5/6. The LCD is 12. Converting: 2/3 = 8/12, 1/4 = 3/12, 5/6 = 10/12. Adding: 8/12 + 3/12 + 10/12 = 21/12 = 1.75 or 7/4."
			},
			"finish_reason": "completed"
		}
	],
	"usage": {
		"prompt_tokens": 20,
		"completion_tokens": 45,
		"reasoning_tokens": 35,
		"total_tokens": 65
	}
}
```

## Specifying Reasoning Token Budget

For models that support it, you can specify an exact token budget for reasoning using the `reasoning` object with `max_tokens`. This gives you precise control over how many tokens the model allocates to its thinking process.

<Callout type="info">
  When `reasoning.max_tokens` is specified, it overrides `reasoning.effort` and
  `reasoning_effort`. Supported by Anthropic Claude and Google Gemini thinking
  models.
</Callout>

### Example Request

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-20250514",
    "messages": [
      {
        "role": "user",
        "content": "Explain the P vs NP problem and why it matters."
      }
    ],
    "reasoning": {
      "max_tokens": 8000
    }
  }'
```

### Supported Models

The `reasoning.max_tokens` parameter is supported by:

* **Anthropic Claude**: Claude 3.7 Sonnet, Claude Sonnet 4, Claude Opus 4, Claude Opus 4.5
* **Google Gemini**: Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 3 Pro Preview

When using auto-routing or root models with `reasoning.max_tokens`, only providers that support this feature will be considered.

### Provider-Specific Constraints

* **Anthropic**: Reasoning budget must be between 1,024 and 128,000 tokens. Values outside this range are automatically clamped.
* **Google**: No specific constraints on the reasoning budget.

### Error Handling

If you specify `reasoning.max_tokens` for a model that doesn't support it, you'll receive an error:

```json
{
	"error": {
		"message": "Model gpt-4o does not support reasoning.max_tokens. Remove the reasoning parameter or use a model that supports explicit reasoning token budgets.",
		"type": "invalid_request_error",
		"code": "model_not_supported"
	}
}
```

## Streaming Reasoning Content

When streaming is enabled, reasoning content will be streamed as part of the response chunks:

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-120b",
    "messages": [
      {
        "role": "user",
        "content": "Solve this logic puzzle: If all roses are flowers and some flowers fade quickly, can we conclude that some roses fade quickly?"
      }
    ],
    "reasoning_effort": "high",
    "stream": true
  }'
```

The reasoning content will appear in the stream chunks before the final answer, allowing you to display the model's thought process in real-time.

Example:

```
data: {
	"id": "chatcmpl-fb266880-1016-4797-9a70-f21a538edaf6",
	"object": "chat.completion.chunk",
	"created": 1761048126,
	"model": "openai/gpt-oss-20b",
	"choices": [
		{
			"index": 0,
			"delta": {
				"reasoning": "It's ",
				"role": "assistant"
			},
			"finish_reason": null
		}
	]
}
```

## Usage Tracking

### Response Payload

The `usage` object in the response includes reasoning-specific token counts:

* `reasoning_tokens` - Number of tokens used for the reasoning process
* `completion_tokens` - Number of tokens in the final answer
* `prompt_tokens` - Number of tokens in the input
* `total_tokens` - Sum of all token counts

### Logs and Analytics

All requests using the `reasoning_effort` parameter are tracked in your dashboard logs with:

* The `reasoningContent` field containing the full reasoning text
* Separate token counts for reasoning vs. completion
* Performance metrics for reasoning-enabled requests

You can view detailed logs for each request in the [dashboard](https://llmgateway.io/dashboard) to analyze how models are reasoning through problems.

## Auto-Routing with Reasoning

When using auto-routing (specifying a model like `gpt-5` without a specific version), LLMGateway will:

1. Automatically set `reasoning_effort` to `minimal` for GPT-5 models
2. Set `reasoning_effort` to `low` for other auto-routed reasoning models
3. Only route to providers that support reasoning when `reasoning_effort` is specified

This ensures optimal performance and cost when using auto-routing with reasoning-capable models.

## Model-Specific Behavior

Not all reasoning models return reasoning content in the same way. Some models (like OpenAI models) may reason internally but not expose the reasoning content in the response. LLMGateway makes sure the response is unified across different providers, but the depth and format of reasoning may vary.

## Best Practices

1. **Choose appropriate reasoning effort**: Use `low` or `minimal` for simple tasks, `medium` for most tasks, and `high` only for complex problems that require deep reasoning
2. **Monitor token usage**: Reasoning can significantly increase token consumption - monitor your `reasoning_tokens` in the usage object
3. **Stream for better UX**: When building user-facing applications, enable streaming to show the reasoning process in real-time
4. **Check logs**: Review the `reasoningContent` in your dashboard logs to understand how models are solving problems

## Error Handling

If you specify `reasoning_effort` for a model that doesn't support reasoning, you'll receive an error:

```json
{
	"error": {
		"message": "Model gpt-4o does not support reasoning. Remove the reasoning_effort parameter or use a reasoning-capable model.",
		"type": "invalid_request_error",
		"code": "model_not_supported"
	}
}
```

To avoid this error, only use the `reasoning_effort` parameter with [reasoning-enabled models](https://llmgateway.io/models?filters=1\&reasoning=true).


# Response Healing
URL: /features/response-healing
import { Callout } from "fumadocs-ui/components/callout";

# Response Healing

Response Healing is a plugin that automatically validates and repairs malformed JSON responses from AI models. When enabled, LLM Gateway ensures that API responses conform to your specified schemas even when the model's formatting is imperfect.

## Why Response Healing?

Large language models occasionally produce invalid JSON, especially in complex scenarios:

* **Markdown wrapping**: Models often wrap JSON in code blocks like \`\`\`json...\`\`\`
* **Mixed content**: JSON may be preceded or followed by explanatory text
* **Syntax errors**: Trailing commas, unquoted keys, or single quotes instead of double quotes
* **Truncated output**: Token limits may cut off responses mid-JSON

Response Healing automatically detects and fixes these issues, saving you from implementing error handling for every possible malformed response.

## Enabling Response Healing

To enable Response Healing, add `response-healing` to the `plugins` array in your request:

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Return a JSON object with name and age"}],
    "response_format": {"type": "json_object"},
    "plugins": [{"id": "response-healing"}]
  }'
```

<Callout type="info">
  Response Healing only activates when `response_format` is set to `json_object`
  or `json_schema`. For regular text responses, the plugin has no effect.
</Callout>

## How It Works

When Response Healing is enabled, LLM Gateway applies a series of repair strategies to malformed JSON responses:

### 1. Markdown Extraction

Extracts JSON from markdown code blocks:

```text
Here's the data:
\`\`\`json
{"name": "Alice", "age": 30}
\`\`\`
```

Becomes:

```json
{ "name": "Alice", "age": 30 }
```

### 2. Mixed Content Extraction

Separates JSON from surrounding text:

```text
Sure! Here is the JSON you requested: {"name": "Alice", "age": 30} Let me know if you need anything else.
```

Becomes:

```json
{ "name": "Alice", "age": 30 }
```

### 3. Syntax Fixes

Repairs common JSON syntax violations:

| Issue           | Before              | After               |
| --------------- | ------------------- | ------------------- |
| Trailing commas | `{"a": 1,}`         | `{"a": 1}`          |
| Unquoted keys   | `{name: "Alice"}`   | `{"name": "Alice"}` |
| Single quotes   | `{'name': 'Alice'}` | `{"name": "Alice"}` |

### 4. Truncation Completion

Adds missing closing brackets for truncated responses:

```text
{"name": "Alice", "data": {"nested": true
```

Becomes:

```json
{ "name": "Alice", "data": { "nested": true } }
```

## Usage Examples

### With JSON Object Format

Request a structured response with automatic healing:

```typescript
const response = await fetch("https://api.llmgateway.io/v1/chat/completions", {
	method: "POST",
	headers: {
		Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`,
		"Content-Type": "application/json",
	},
	body: JSON.stringify({
		model: "gpt-4o",
		messages: [
			{
				role: "user",
				content:
					"Return a JSON object with fields: name (string) and age (number)",
			},
		],
		response_format: { type: "json_object" },
		plugins: [{ id: "response-healing" }],
	}),
});

const result = await response.json();
// Response is guaranteed to be valid JSON
const data = JSON.parse(result.choices[0].message.content);
```

### With JSON Schema

For stricter validation, combine with `json_schema`:

```typescript
const response = await fetch("https://api.llmgateway.io/v1/chat/completions", {
	method: "POST",
	headers: {
		Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`,
		"Content-Type": "application/json",
	},
	body: JSON.stringify({
		model: "gpt-4o",
		messages: [
			{
				role: "user",
				content: "Generate a user profile",
			},
		],
		response_format: {
			type: "json_schema",
			json_schema: {
				name: "user_profile",
				schema: {
					type: "object",
					required: ["name", "email"],
					properties: {
						name: { type: "string" },
						email: { type: "string" },
						age: { type: "number" },
					},
				},
			},
		},
		plugins: [{ id: "response-healing" }],
	}),
});

const result = await response.json();
```

## Healing Metadata

When a response is healed, the healing method is logged for debugging. The following healing methods may be applied:

| Method                     | Description                                 |
| -------------------------- | ------------------------------------------- |
| `markdown_extraction`      | JSON extracted from markdown code blocks    |
| `mixed_content_extraction` | JSON extracted from surrounding text        |
| `syntax_fix`               | Trailing commas, quotes, or keys were fixed |
| `truncation_completion`    | Missing closing brackets were added         |
| `combined_strategies`      | Multiple strategies were applied            |

## Limitations

<Callout type="warning">
  Response Healing is only available for non-streaming requests. Streaming
  responses are returned as-is without healing.
</Callout>

Response Healing works best for:

* Simple to moderately complex JSON structures
* Common formatting issues from LLMs

It may not be able to repair:

* Severely corrupted or nonsensical output
* Complex nested structures with multiple issues
* Responses that don't contain any recognizable JSON

## Best Practices

### Use with Structured Prompts

Combine Response Healing with clear instructions for best results:

```typescript
const response = await fetch("https://api.llmgateway.io/v1/chat/completions", {
	method: "POST",
	headers: {
		Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`,
		"Content-Type": "application/json",
	},
	body: JSON.stringify({
		model: "gpt-4o",
		messages: [
			{
				role: "system",
				content: "Always respond with valid JSON. No explanations.",
			},
			{
				role: "user",
				content: "List three colors as a JSON array",
			},
		],
		response_format: { type: "json_object" },
		plugins: [{ id: "response-healing" }],
	}),
});

const result = await response.json();
```

### Validate Critical Data

For critical applications, validate the healed JSON in your code:

```typescript
const result = await response.json();
const content = result.choices[0].message.content;
const data = JSON.parse(content);

// Add your own validation
if (!data.name || typeof data.name !== "string") {
	throw new Error("Invalid response: missing name");
}
```

### Monitor Healing Rates

If you notice frequent healing in your logs, consider:

* Improving your prompts to request cleaner JSON
* Using models with better JSON output (e.g., GPT-4o, Claude 3.5)
* Adding explicit JSON examples in your prompts


# Routing
URL: /features/routing
import { Callout } from "fumadocs-ui/components/callout";

# Routing

LLMGateway provides flexible and intelligent routing options to help you get the best performance and cost efficiency from your AI applications. Whether you want to use specific models, providers, or let our system automatically optimize your requests, we've got you covered.

LLMGateway also includes **automatic retry and fallback** — if a provider fails, your request is seamlessly retried on the next best provider, all within the same API call.

## Model Selection

### Any Model Name

You can use any model name from our [models page](https://llmgateway.io/models) or discover available models programmatically through the [/v1/models endpoint](/v1_models).

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
```

### Model ID Routing

Choose a specific model ID to route to the **best available provider** for that model. LLMGateway's smart routing algorithm considers multiple factors to find the optimal provider across all configured options.

#### Smart Routing Algorithm

When you use a model ID without a provider prefix, LLMGateway's intelligent routing system analyzes multiple factors to select the best provider:

**Weighted Scoring System** (based on last 5 minutes of metrics):

* **Uptime (50%)** - Prioritizes providers with high reliability and low error rates
* **Throughput (20%)** - Favors providers with higher tokens per second generation speed
* **Price (20%)** - Considers cost efficiency while maintaining quality
* **Latency (10%)** - Considers time to first token (only applied for streaming requests)

The algorithm calculates a weighted score for each available provider and selects the one with the lowest (best) score. All metrics are normalized to ensure fair comparison across providers.

**Latency Weight for Non-Streaming Requests**:

For non-streaming requests, the latency weight (10%) is redistributed proportionally to the other factors since time-to-first-token is less relevant when waiting for the complete response.

**Exponential Uptime Penalty**:

Providers with uptime below 95% receive an additional exponential penalty that increases rapidly as uptime drops:

* 95-100% uptime: No penalty
* 90% uptime: \~0.07 penalty
* 80% uptime: \~0.62 penalty
* 70% uptime: \~1.73 penalty
* 50% uptime: \~5.61 penalty

This ensures providers experiencing significant issues are strongly deprioritized while minor fluctuations have minimal impact.

**Epsilon-Greedy Exploration** (1% of requests):

To solve the "cold start problem" where new or unused providers never get traffic to build up metrics, the system randomly explores different providers 1% of the time. This ensures:

* All providers periodically receive traffic
* New providers can prove their reliability
* The system adapts to changing provider performance
* You benefit from improved routing decisions over time

**Routing Metadata**:

Every request includes detailed routing metadata in the logs, showing:

* Available providers that were considered
* Selected provider and selection reason
* Scores for each provider (including uptime, throughput, latency, and price)

This transparency allows you to understand and debug routing decisions.

<Callout type="info">
  Using model IDs without a provider prefix automatically routes to the optimal
  provider based on reliability, speed, and cost. The system continuously learns
  and adapts based on real-time performance metrics.
</Callout>

<Callout type="success">
  Smart routing prioritizes reliability over cost, ensuring your requests are
  routed to providers with proven uptime and performance, while still
  considering cost efficiency.
</Callout>

### Provider-Specific Routing

To use a specific provider without any fallbacks, prefix the model name with the provider name followed by a slash:

```bash
# Use OpenAI specifically
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Use DeepSeek provider specifically
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v3.2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
```

#### Low-Uptime Protection

When you specify a provider explicitly, LLMGateway checks the provider's recent uptime (last 5 minutes). If the uptime falls below 90%, the system automatically routes your request to the best available alternative provider to ensure reliability. This protects your application from providers experiencing temporary issues.

<Callout type="info">
  If the requested provider has low uptime but no alternative providers are
  available for that model, the request will still be sent to the originally
  requested provider.
</Callout>

#### Disabling Fallback with X-No-Fallback Header

If you need to bypass this protection and always use the exact provider you specified regardless of its current uptime, you can use the `X-No-Fallback` header:

```bash
# Force use of a specific provider even if it has low uptime
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-No-Fallback: true" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
```

<Callout type="warn">
  Using `X-No-Fallback: true` disables automatic provider failover. Your
  requests will be sent to the specified provider even if it is experiencing
  issues, which may result in higher error rates.
</Callout>

When the `X-No-Fallback` header is used, the routing metadata in logs will include `noFallback: true` to indicate that fallback was disabled for that request.

## Automatic Retry & Fallback

When using model ID routing (without a provider prefix), LLMGateway automatically retries failed requests on alternate providers. This happens transparently within the same API call — your application receives the successful response as if nothing went wrong.

### How Retry Works

1. Your request is routed to the best available provider using the smart routing algorithm
2. If that provider returns a server error (5xx), times out, or has a connection failure, the gateway marks the provider as failed
3. The next best available provider is selected and the request is retried
4. Up to **2 retries** are attempted before returning an error to the client

```
Request → Provider A (500 error) → Provider B (200 OK) → Response
```

Both streaming and non-streaming requests support automatic retry.

### What Triggers a Retry

Retries are triggered by **server-side failures** only:

* **5xx errors** (500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, etc.)
* **Timeouts** (upstream provider took too long to respond)
* **Connection failures** (network errors, DNS failures, etc.)

Retries are **not** triggered by:

* **4xx client errors** (400 Bad Request, 401 Unauthorized, 403 Forbidden, 422 Unprocessable Entity)
* **Content filter responses** (Azure ResponsibleAI, etc.)

### When Retry Is Disabled

Automatic retry is disabled when:

* The `X-No-Fallback: true` header is set
* A specific provider is requested (e.g., `openai/gpt-4o`)
* No alternative providers are available for the requested model
* The maximum retry count (2) has been exhausted

### Routing Transparency

Every provider attempt — both failed and successful — is recorded in the `routing` array in the response metadata and activity logs:

```json
{
	"metadata": {
		"routing": [
			{
				"provider": "openai",
				"model": "gpt-4o",
				"status_code": 500,
				"error_type": "server_error",
				"succeeded": false
			},
			{
				"provider": "azure",
				"model": "gpt-4o",
				"status_code": 200,
				"error_type": "none",
				"succeeded": true
			}
		]
	}
}
```

### Retried Log Tracking

Each provider attempt creates its own log entry. Failed attempts that were retried are marked with:

* **`retried: true`** — indicates this failed request was retried on another provider
* **`retriedByLogId`** — the ID of the final successful log entry

This allows you to distinguish between unrecovered failures and failures that were transparently recovered via retry. In the dashboard, retried logs display a "Retried" badge with a link to the successful log.

### Impact on Provider Health

Failed attempts still count against the provider's uptime score, even when the request was successfully retried on another provider. This means:

* A provider that keeps failing will see its uptime score drop
* The exponential uptime penalty kicks in below 95% (see [Smart Routing Algorithm](#smart-routing-algorithm))
* Future requests are automatically routed away from unreliable providers
* Your application stays reliable without any code changes on your side

<Callout type="success">
  Automatic retry and fallback works together with smart routing to provide
  self-healing behavior. Failing providers are automatically avoided, and your
  requests are transparently recovered on reliable alternatives.
</Callout>

## Optimized Auto Routing

Auto routing automatically selects the best model for your specific use case without you having to specify a model at all.

### Current Implementation

The auto routing system currently:

* **Chooses cost-effective models** by default for optimal price-to-performance ratio
* **Automatically scales to more powerful models** based on your request's context size
* **Handles large contexts intelligently** by selecting models with appropriate context windows

```bash
# Let LLMGateway choose the optimal model
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Your request here..."}]
  }'
```

### Free Models Only

When using auto routing, you can restrict the selection to only free models (models with zero input and output pricing) by setting the `free_models_only` parameter to `true`:

```bash
# Auto route to free models only
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello!"}],
    "free_models_only": true
  }'
```

<Callout type="success">
  Adding even a small amount of credits to your account (e.g., $5) will
  immediately upgrade your free model rate limits from 5 requests per 10 minutes
  to 20 requests per minute.
</Callout>

<Callout type="info">
  The `free_models_only` parameter only works with auto routing (`"model":
  	"auto"`). If no free models are available that meet your request requirements,
  the API will return an error.
</Callout>

### Reasoning models only

Just specify the `reasoning_effort` value and only a model which supports reasoning will be chosen. This parameter is not specific to the auto model.

```bash
# Auto route only to reasoning models
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello!"}],
    "reasoning_effort": "medium"
  }'
```

### Exclude Reasoning Models

When using auto routing, you can exclude reasoning models from selection by setting the `no_reasoning` parameter to `true`. This is useful when you want faster responses or need to avoid the additional cost and latency of reasoning models:

```bash
# Auto route excluding reasoning models
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello!"}],
    "no_reasoning": true
  }'
```

<Callout type="info">
  The `no_reasoning` parameter only works with auto routing (`"model": "auto"`).
  If no non-reasoning models are available that meet your request requirements,
  the API will return an error.
</Callout>

<Callout type="success">
  Auto routing analyzes your payload and automatically chooses between
  cost-effective models for simple requests and more powerful models for complex
  or large-context requests.
</Callout>

### Coming Soon: Advanced Optimization

We're continuously improving our auto routing capabilities. Soon you'll benefit from:

* **Tool call optimization**: Automatically select models that excel at function calling and structured outputs
* **Content-aware routing**: Analyze message content to determine the best model for specific types of requests (coding, creative writing, analysis, etc.)
* **Performance-based routing**: Route based on historical performance data for similar requests
* **Multi-model orchestration**: Intelligently combine multiple models for complex workflows

### How It Works

1. **Request Analysis**: The system analyzes your request including message content, context size, and any special parameters
2. **Model Selection**: Based on the analysis, it selects the most appropriate model considering cost, performance, and capabilities
3. **Transparent Routing**: Your request is seamlessly routed to the chosen model and provider
4. **Optimized Response**: You receive the best possible response while maintaining cost efficiency

<Callout type="info">
  Auto routing decisions are transparent in your usage logs, so you can always
  see which model was selected for each request.
</Callout>

## Best Practices

### For Development

* Use specific model names during development and testing
* Leverage auto routing for production workloads to optimize costs

### For Production

* Use auto routing (`"model": "auto"`) for the best balance of cost and performance
* Monitor your usage patterns through the dashboard to understand routing decisions
* Set up provider keys for multiple providers to maximize routing options

### For Cost Optimization

* Let auto routing handle model selection to automatically use the most cost-effective options
* Use model IDs without provider prefixes to always get the cheapest available provider
* Monitor your usage analytics to track cost savings from intelligent routing


# Source Attribution
URL: /features/source
# Source Attribution

The `X-Source` header allows you to identify your domain when making requests to LLM Gateway. This information is used to generate public usage statistics showing how LLM Gateway is being used across different websites and applications.

## X-Source Header

Include the `X-Source` header with your domain name in your requests:

```bash
curl -X POST https://api.llmgateway.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "X-Source: example.com" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Hello, how are you?"
      }
    ]
  }'
```

## Domain Format

The `X-Source` header accepts domain names in various formats. All of the following are valid and will be normalized to the same domain:

* `example.com`
* `https://example.com`
* `https://www.example.com`
* `www.example.com`

All variations will be stripped down to the base domain (`example.com`) for aggregation purposes.

## Public Statistics

Data from the `X-Source` header is used to generate public statistics about LLM Gateway usage, including:

* **Popular Domains**: Which websites and applications are using LLM Gateway most frequently
* **Model Usage**: What models are being used by different domains
* **Geographic Distribution**: Where requests are coming from across different sources
* **Growth Trends**: How usage is growing over time for different domains

These statistics help demonstrate the adoption and impact of LLM Gateway across the ecosystem.

## Privacy Considerations

### What's Public

* Domain names (stripped of protocol and www prefixes)
* Aggregated request counts and model usage
* General geographic regions (country-level data)

### What's Private

* Individual request content or responses
* User identifiers or personal information
* Detailed usage patterns beyond aggregated counts
* API keys or authentication details

## Benefits

Including the `X-Source` header provides several benefits:

### For Your Project

* **Recognition**: Your domain will appear in public usage statistics
* **Credibility**: Demonstrates real-world usage of your application
* **Community**: Contributes to the broader LLM Gateway ecosystem

### For the Community

* **Transparency**: Shows real adoption and usage patterns
* **Inspiration**: Other developers can see successful implementations
* **Growth**: Helps demonstrate the value of open-source LLM infrastructure

## Optional but Recommended

While the `X-Source` header is optional, we strongly encourage its use to:

* Support transparency in the LLM Gateway ecosystem
* Help showcase successful integrations
* Contribute to understanding of LLM usage patterns
* Demonstrate the real-world impact of your application

Your participation helps build a more transparent and collaborative LLM ecosystem.


# Vision Support
URL: /features/vision
import { Callout } from "fumadocs-ui/components/callout";

# Vision Support

LLMGateway supports vision-enabled models that can analyze and describe images. You can provide images via HTTPS URLs or inline base64-encoded data.

## Vision-Enabled Models

You can find all vision-enabled models on our [models page with vision filter](https://llmgateway.io/models?filters=1\&vision=true). These models can process both text and image content in the same request.

## Image Formats

### Using HTTPS URLs

You can provide any publicly accessible HTTPS URL pointing to an image:

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What do you see in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image.jpg"
            }
          }
        ]
      }
    ]
  }'
```

### Using Base64 Inline Data

You can also provide images as base64-encoded data URIs:

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe this image"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEASABIAAD..."
            }
          }
        ]
      }
    ]
  }'
```

## Content Array Format

When using vision models, the `content` field should be an array containing both text and image content blocks:

* **Text content**: `{"type": "text", "text": "Your message"}`
* **Image content**: `{"type": "image_url", "image_url": {"url": "image_url_or_data_uri"}}`

## Multiple Images

You can include multiple images in a single request:

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Compare these two images"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image1.jpg"
            }
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image2.jpg"
            }
          }
        ]
      }
    ]
  }'
```

## Simple String Content

<Callout type="info">
  For vision models, you can still use simple string content for text-only
  messages. The array format is only required when including images.
</Callout>

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Hello! How can you help me today?"
      }
    ]
  }'
```

## Supported Image Types

Vision models typically support common image formats including:

* JPEG (.jpg, .jpeg)
* PNG (.png)
* WebP (.webp)
* GIF (.gif)

The specific formats supported may vary by model provider. Check the individual model documentation for format limitations and file size restrictions.

## Error Handling

If an image URL is inaccessible or the image format is unsupported, the gateway will handle the error gracefully and may substitute a placeholder or error message in the request to the underlying model.


# Native Web Search
URL: /features/web-search
import { Callout } from "fumadocs-ui/components/callout";

# Native Web Search

LLM Gateway supports native web search capabilities that allow models to access real-time information from the internet. This feature is useful for answering questions about current events, recent news, live data, and other time-sensitive information that may not be in the model's training data.

## How It Works

When you include the `web_search` tool in your request, the model can search the web to gather relevant information before generating a response:

1. You send a request with the `web_search` tool enabled
2. The model determines if web search is needed based on the query
3. If needed, the model performs web searches to gather current information
4. The model synthesizes the search results and generates a response
5. Citations are included in the response to show information sources

## Supported Providers

Native web search is available on select models. See all models with native web search support on our [models page](https://llmgateway.io/models?filters=1\&webSearch=true).

## Basic Usage

To enable web search, add the `web_search` tool to your request:

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.2",
    "messages": [
      {
        "role": "user",
        "content": "What is the current weather in San Francisco?"
      }
    ],
    "tools": [
      {
        "type": "web_search"
      }
    ]
  }'
```

### Example Response

```json
{
	"id": "chatcmpl-abc123",
	"object": "chat.completion",
	"created": 1234567890,
	"model": "openai/gpt-5.2",
	"choices": [
		{
			"index": 0,
			"message": {
				"role": "assistant",
				"content": "The current weather in San Francisco is 57°F (14°C) with mostly cloudy skies...",
				"annotations": [
					{
						"type": "url_citation",
						"url": "https://weather.com/...",
						"title": "San Francisco Weather"
					}
				]
			},
			"finish_reason": "stop"
		}
	],
	"usage": {
		"prompt_tokens": 15,
		"completion_tokens": 150,
		"total_tokens": 165,
		"cost_usd_total": 0.0315
	}
}
```

## Web Search Options

The `web_search` tool accepts optional configuration parameters:

### User Location

Provide location context to get more relevant local search results:

```json
{
	"type": "web_search",
	"user_location": {
		"city": "San Francisco",
		"region": "California",
		"country": "US",
		"timezone": "America/Los_Angeles"
	}
}
```

### Search Context Size

Control the amount of web content retrieved (OpenAI only):

```json
{
	"type": "web_search",
	"search_context_size": "medium"
}
```

Available values:

* `low` - Minimal search context, faster responses
* `medium` - Balanced context (default)
* `high` - Maximum search context, more comprehensive

### Max Uses

Limit the number of searches per request (provider-dependent):

```json
{
	"type": "web_search",
	"max_uses": 3
}
```

## Using with SDKs

### OpenAI SDK (Python)

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.llmgateway.io/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[
        {"role": "user", "content": "What are the latest news headlines today?"}
    ],
    tools=[{"type": "web_search"}]
)

print(response.choices[0].message.content)
```

### OpenAI SDK (TypeScript)

```typescript
import OpenAI from "openai";

const client = new OpenAI({
	baseURL: "https://api.llmgateway.io/v1",
	apiKey: "your-api-key",
});

const response = await client.chat.completions.create({
	model: "gpt-5.2",
	messages: [{ role: "user", content: "What are the latest tech news?" }],
	tools: [{ type: "web_search" }],
});

console.log(response.choices[0].message.content);
```

## Streaming

Web search works with streaming responses. Citations are included in the final chunks:

```bash
curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.2",
    "messages": [
      {"role": "user", "content": "What is the current stock price of Apple?"}
    ],
    "tools": [{"type": "web_search"}],
    "stream": true
  }'
```

## Citations and Sources

Web search responses include citations to show where information was sourced from. These appear in the `annotations` field of the message:

```json
{
	"annotations": [
		{
			"type": "url_citation",
			"url": "https://example.com/article",
			"title": "Article Title",
			"start_index": 0,
			"end_index": 50
		}
	]
}
```

<Callout type="info">
  Citation format may vary slightly between providers, but LLM Gateway
  normalizes them into a consistent structure.
</Callout>

## Cost Tracking

Web search costs are tracked separately from token costs in the usage object:

```json
{
	"usage": {
		"prompt_tokens": 15,
		"completion_tokens": 150,
		"total_tokens": 165,
		"cost_usd_total": 0.0125,
		"cost_usd_input": 0.0015,
		"cost_usd_output": 0.01,
		"cost_usd_web_search": 0.01
	}
}
```

The `cost_usd_web_search` field shows the cost incurred specifically for web search queries. Web search is billed at $0.01 per search call for reasoning models (GPT-5, o-series) and $0.025 per call for non-reasoning models.

## Combining with Function Tools

You can use web search alongside regular function tools:

```json
{
	"tools": [
		{ "type": "web_search" },
		{
			"type": "function",
			"function": {
				"name": "get_weather",
				"description": "Get weather for a location",
				"parameters": {
					"type": "object",
					"properties": {
						"location": { "type": "string" }
					}
				}
			}
		}
	]
}
```

<Callout type="warning">
  Some dedicated search models only support web search and do not support
  additional function tools. Use `gpt-5.2` or other GPT-5 series models if you
  need both web search and function tools.
</Callout>

## Use Cases

### Current Events and News

```json
{
	"messages": [
		{ "role": "user", "content": "What are the major news stories today?" }
	],
	"tools": [{ "type": "web_search" }]
}
```

### Real-Time Data

```json
{
	"messages": [
		{ "role": "user", "content": "What is the current price of Bitcoin?" }
	],
	"tools": [{ "type": "web_search" }]
}
```

### Research and Fact-Checking

```json
{
	"messages": [
		{
			"role": "user",
			"content": "What are the latest findings on climate change?"
		}
	],
	"tools": [{ "type": "web_search" }]
}
```

### Local Information

```json
{
	"messages": [
		{
			"role": "user",
			"content": "What restaurants are open near me right now?"
		}
	],
	"tools": [
		{
			"type": "web_search",
			"user_location": {
				"city": "New York",
				"country": "US"
			}
		}
	]
}
```

## Best Practices

1. **Use GPT-5.2**: For the best web search experience with full tool support, use `gpt-5.2`
2. **Provide location context**: When queries are location-dependent, include `user_location` for more relevant results
3. **Monitor costs**: Web search incurs per-query costs in addition to token costs
4. **Check citations**: Always review the citations in responses to verify information sources
5. **Use streaming**: For user-facing applications, enable streaming to show responses as they're generated

## Error Handling

If you try to use web search with a model that doesn't support it:

```json
{
	"error": {
		"message": "Model gpt-4o does not support native web search. Remove the web_search tool or use a model that supports it. See https://llmgateway.io/models?features=webSearch for supported models.",
		"type": "invalid_request_error"
	}
}
```

To avoid this error, only use the `web_search` tool with [native web search enabled models](https://llmgateway.io/models?filters=1\&webSearch=true).


# AWS Bedrock Integration
URL: /integrations/aws-bedrock
import { Step, Steps } from "fumadocs-ui/components/steps";

AWS Bedrock is Amazon's fully managed service that provides access to foundation models from leading AI companies. This guide shows how to create AWS Bedrock Long-Term API Keys and integrate them with LLM Gateway.

## Prerequisites

* An AWS account with Bedrock access enabled
* LLM Gateway account or self-hosted instance

## Overview

AWS Bedrock supports **Long-Term API Keys** for simplified authentication. These keys provide direct API access without requiring IAM credentials or complex authentication flows.

## Create AWS Bedrock Long-Term API Key

<Steps>
  <Step>
    ### Enable Model Access in Bedrock

    1. Log into the **AWS Console**
    2. Navigate to **AWS Bedrock** service
    3. Go to **Model access** in the left sidebar
    4. Click **Manage model access**
    5. Enable the models you want to use (e.g., Claude 3.5, Llama 3)
    6. Wait for access to be granted (usually instant for most models)
  </Step>

  <Step>
    ### Create Long-Term API Key

    1. In AWS Bedrock console, navigate to **API Keys** in the left sidebar
    2. Click **Create Long-Term API Key**
    3. Set expiry date ("Never expires" is recommended)
    4. Click **Generate**
    5. **Important**: Copy the API key immediately - it's only shown once!
  </Step>
</Steps>

## Add to LLM Gateway

<Steps>
  <Step>
    ### Navigate to Provider Keys

    1. Log into [LLM Gateway Dashboard](https://llmgateway.io/dashboard)
    2. Select your organization and project
    3. Go to **Provider Keys** in the sidebar
  </Step>

  <Step>
    ### Add AWS Bedrock Provider Key

    1. Click **Add** for **AWS Bedrock**
    2. Paste your Long-Term API Key
    3. **Select Region Prefix** based on where you want to use your models:
       * **us.** - For US regions (`us-east-1`, `us-west-2`)
       * **eu.** - For European regions (`eu-central-1`, `eu-west-1`)
       * **global.** - For global/cross-region endpoints
    4. Click **Add Key**

    The system will validate your key and confirm the connection.
  </Step>

  <Step>
    ### Test the Integration

    Test your integration with a simple API call:

    ```bash
    curl -X POST https://api.llmgateway.io/v1/chat/completions \
      -H "Authorization: Bearer YOUR_LLMGATEWAY_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "aws-bedrock/claude-3-5-sonnet",
        "messages": [
          {
            "role": "user",
            "content": "Hello from AWS Bedrock!"
          }
        ]
      }'
    ```

    Replace `YOUR_LLMGATEWAY_API_KEY` with your LLM Gateway API key.
  </Step>
</Steps>

## Available Models

Once configured, you can access all AWS Bedrock models through LLM Gateway:

* **Anthropic Claude**: `aws-bedrock/claude-3-5-sonnet`, `aws-bedrock/claude-3-5-haiku`
* **Meta Llama**: `aws-bedrock/llama-3-2-90b`, `aws-bedrock/llama-3-2-11b`
* **Amazon Titan**: `aws-bedrock/amazon.titan-text-express-v1`
* **And more...**

Browse all available models at [llmgateway.io/models](https://llmgateway.io/models?provider=aws-bedrock)

## Troubleshooting

### "Model not available" error

* Verify you've enabled model access in AWS Bedrock console
* Check that the region where you created your key has access to the model
* Some models are only available in specific regions

### Rate limiting

* AWS Bedrock has request quotas per model and region
* Monitor usage in AWS Bedrock console
* Consider requesting quota increases for high-volume workloads


# Azure Integration
URL: /integrations/azure
import { Step, Steps } from "fumadocs-ui/components/steps";

Azure provides access to OpenAI's powerful language models through Microsoft's enterprise cloud infrastructure. This guide shows how to create an Azure resource, deploy models, and integrate them with LLM Gateway.

<Callout type="info">
  Only OpenAI models are supported via Azure at this time. [Open an
  issue](https://github.com/theopenco/llmgateway/issues/new) to request support
  for other model types.
</Callout>

## Prerequisites

* An Azure account with an active subscription
* LLM Gateway account or self-hosted instance

## Overview

Azure provides enterprise-grade access to OpenAI models with enhanced security, compliance, and regional availability. LLM Gateway integrates seamlessly with Azure deployments.

## Create Azure Resource

<Steps>
  <Step>
    ### Create an Azure OpenAI Resource

    1. Log into the **Azure Portal** ([https://portal.azure.com](https://portal.azure.com))
    2. Click **Create a resource**
    3. Search for **Azure OpenAI** and select it
    4. Click **Create**
    5. Configure the resource:
       * **Subscription**: Select your Azure subscription
       * **Resource group**: Create new or select existing
       * **Region**: Choose a region (e.g., East US, West Europe)
       * **Name**: Enter a unique resource name (this will be your `<resource-name>`)
       * **Pricing tier**: Select Standard S0
    6. Click **Review + create**, then **Create**
    7. Wait for deployment to complete

    **Important**: Note your resource name - it will be used in the base URL: `https://<resource-name>.openai.azure.com`
  </Step>

  <Step>
    ### Deploy Models

    1. Navigate to your Azure resource in the Azure Portal
    2. Click **Go to Azure OpenAI Studio** or visit [https://oai.azure.com](https://oai.azure.com)
    3. In Azure Studio, select **Deployments** from the left sidebar
    4. Click **Create new deployment**
    5. Configure your deployment:
       * **Model**: Select a model (e.g., gpt-4o, gpt-4o-mini, gpt-4-turbo)
       * **Deployment name**: Enter a name (this must match the model identifier you'll use – use the pre-filled name)
       * **Model version**: Select the latest version
       * **Deployment type**: Global Standard
    6. Click **Create**
    7. Repeat for additional models you want to use

    **Note**: The deployment name must match the expected model name:

    * For `gpt-4o-mini` → deployment name should be `gpt-4o-mini`
    * For `gpt-35-turbo` → deployment name should be `gpt-35-turbo`
      etc.
  </Step>

  <Step>
    ### Get API Key

    1. In the Azure Portal, go to your Azure resource
    2. Click **Keys and Endpoint** in the left sidebar
    3. Copy **Key 1** or **Key 2**
    4. Note your **Endpoint** URL (should be `https://<resource-name>.openai.azure.com`)

    **Important**: Keep your API key secure - it provides access to your Azure deployments.
  </Step>
</Steps>

## Add to LLM Gateway

<Steps>
  <Step>
    ### Navigate to Provider Keys

    1. Log into [LLM Gateway Dashboard](https://llmgateway.io/dashboard)
    2. Select your organization and project
    3. Go to **Provider Keys** in the sidebar
  </Step>

  <Step>
    ### Add Azure Provider Key

    1. Click **Add** for **Azure**
    2. Enter your **API Key** from Azure Portal
    3. Enter your **Resource Name** (the name from your Azure endpoint URL)
       * Example: If your endpoint is `https://my-openai-resource.openai.azure.com`, enter `my-openai-resource`
    4. Select your preferred **type** (Azure OpenAI or AI Foundry)
    5. Adapt the **Validation Model** to a model that you already deployed and is available
       This is a one time check to ensure the API key is valid and the model can be accessed.
    6. Click **Add Key**

    The system will validate your key and confirm the connection.
  </Step>

  <Step>
    ### Test the Integration

    Test your integration with a simple API call:

    ```bash
    curl -X POST https://api.llmgateway.io/v1/chat/completions \
      -H "Authorization: Bearer YOUR_LLMGATEWAY_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "azure/gpt-4o-mini",
        "messages": [
          {
            "role": "user",
            "content": "Hello from Azure!"
          }
        ]
      }'
    ```

    Replace `YOUR_LLMGATEWAY_API_KEY` with your LLM Gateway API key.
  </Step>
</Steps>

## Available Models

Once configured, you can access your Azure deployments through LLM Gateway:

* **GPT-4o**: `azure/gpt-4o`
* **GPT-4o Mini**: `azure/gpt-4o-mini`
* **GPT-3.5 Turbo**: `azure/gpt-3.5-turbo` (note: use gpt-3.5-turbo as llmgateway model name instead of gpt-35-turbo)

**Note**: Only models you have deployed in Azure Studio will be available. Ensure your deployment names match the expected model identifiers.

Browse all available models at [llmgateway.io/models](https://llmgateway.io/models?provider=azure)

## Troubleshooting

### "Deployment not found" error

* Verify you've created a deployment in Azure Studio
* Ensure the deployment name exactly matches the model name you're requesting
* Check that the deployment is in the same resource as your API key

### "Resource not found" error

* Verify the resource name is correct (check your Azure Portal endpoint URL)
* Ensure your API key belongs to the correct Azure resource
* Confirm the resource is in an active state in Azure Portal

### Rate limiting

* Azure has Tokens Per Minute (TPM) quotas per deployment
* Monitor usage in Azure Studio under **Quotas**
* Request quota increases through Azure Portal if needed for high-volume workloads

### Region availability

* Not all models are available in all Azure regions
* Check [Azure model availability](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#model-summary-table-and-region-availability) for your region
* Consider creating resources in multiple regions for better availability


# Activity
URL: /learn/activity
import { ThemedImage } from "@/components/themed-image";

The Activity page shows a real-time log of every API request routed through LLM Gateway. Use it to debug requests, monitor performance, and track costs per call.

<ThemedImage alt="Activity Logs" basePath="/learn/activity" />

## Filters

Filter the activity log using the controls at the top:

| Filter                      | Description                                             |
| --------------------------- | ------------------------------------------------------- |
| **Time range**              | Filter by a specific time period                        |
| **Unified reasons**         | Filter by completion reason (e.g., stop, length, error) |
| **Providers**               | Show requests for specific providers only               |
| **Models**                  | Show requests for specific models only                  |
| **Custom header key/value** | Filter by custom metadata headers attached to requests  |

## Activity List

Each activity entry shows:

* **Status icon** — Green checkmark for completed, red circle for errors
* **Response preview** — First line of the model's response (when available)
* **Model** — The provider and model used (e.g., `google-vertex/gemini-3-pro-image-preview`)
* **Cache status** — Whether the response was served from cache
* **Tokens** — Total tokens consumed (input + output)
* **Duration** — How long the request took
* **Cost** — Inference cost for the request
* **Source** — Where the request originated from
* **Discount** — Any discount applied (e.g., "20% off")
* **Status badge** — `completed`, `upstream_error`, `gateway_error`, etc.
* **Timestamp** — Relative time (e.g., "about 4 hours ago")

### Actions per Entry

* **Open in new tab** — View the full request detail in a new browser tab
* **Expand** — Expand inline to see more details

## Activity Detail

Click on any activity entry to view its full detail page.

### Summary Cards

Five cards at the top provide a quick overview:

| Card               | Description                     |
| ------------------ | ------------------------------- |
| **Duration**       | Total request time in seconds   |
| **Tokens**         | Total tokens consumed           |
| **Throughput**     | Tokens per second               |
| **Inference Cost** | Cost charged for this request   |
| **Cache**          | Whether the response was cached |

### Request Section

Details about the original request:

* **Requested Model** — The model ID sent in the API call
* **Used Model** — The actual model that served the request
* **Model Mapping** — The underlying model identifier
* **Provider** — The provider that handled the request
* **Requested Provider** — The provider specified in the request
* **Streamed** — Whether the response was streamed
* **Canceled** — Whether the request was canceled
* **Source** — The application or service that made the request

### Tokens Section

A detailed token breakdown:

* Prompt Tokens, Completion Tokens, Total Tokens
* Reasoning Tokens (for reasoning models)
* Image Input/Output Tokens (for vision/image models)
* Response Size

### Routing Section

How LLM Gateway routed the request:

* **Selection** — The routing strategy used (e.g., `direct-provider-specified`)
* **Available** — Providers that were available for this model
* **Provider Scores** — Scoring breakdown showing availability, uptime, and latency for each provider

### Parameters Section

The model parameters sent with the request:

* Temperature, Max Tokens, Top P
* Frequency Penalty, Reasoning Effort
* Response Format


# API Keys
URL: /learn/api-keys
import { ThemedImage } from "@/components/themed-image";

The API Keys page lets you create, view, and manage the API keys used to authenticate requests to LLM Gateway.

<ThemedImage alt="API Keys" basePath="/learn/api-keys" />

## Creating an API Key

Click the **Create API Key** button to generate a new key. The number of keys you can create depends on your plan:

* **Free** — Limited number of keys
* **Pro** — Higher key limit
* **Enterprise** — Custom limits

When creating a key, you can assign it a name to help identify its purpose (e.g., "Production", "Development", "CI/CD").

## API Keys List

Each key in the list shows:

| Field         | Description                                                    |
| ------------- | -------------------------------------------------------------- |
| **Name**      | The label you assigned to the key                              |
| **Key**       | A masked preview of the key (only last few characters visible) |
| **Created**   | When the key was created                                       |
| **Last used** | When the key was last used in a request                        |

## Actions

For each API key you can:

* **View** — See the full key (only available once after creation)
* **Edit** — Update the key name
* **Rotate** — Generate a new key value while keeping the same configuration
* **Delete** — Permanently remove the key

## Plan Limits

The page shows your current key count vs. the maximum allowed by your plan. If you've reached your limit, the Create button will be disabled and you'll need to upgrade your plan or delete unused keys.


# Audit Logs
URL: /learn/audit-logs
import { Callout } from "fumadocs-ui/components/callout";
import { ThemedImage } from "@/components/themed-image";

The Audit Logs page provides a complete history of all actions performed within your organization, essential for compliance and security monitoring.

<ThemedImage alt="Audit Logs" basePath="/learn/audit-logs" />

<Callout type="info">
  Audit Logs are available on the [**Enterprise
  plan**](https://llmgateway.io/enterprise). Owner or Admin role is required.
</Callout>

## Filters

Narrow down the log entries:

* **Action** — Filter by action type (create, delete, update, etc.)
* **Resource type** — Filter by resource (API, IAM, API Keys, etc.)

Both filters are populated dynamically based on the actions recorded in your organization.

## Audit Log Entries

Each log entry shows:

| Field             | Description                                                  |
| ----------------- | ------------------------------------------------------------ |
| **Timestamp**     | Exact time of the action (formatted as MMM d, yyyy HH:mm:ss) |
| **User**          | Name and email of the person who performed the action        |
| **Action**        | What was done (e.g., "API Keys → create")                    |
| **Resource type** | The type of resource affected (shown as a badge)             |
| **Resource ID**   | Identifier of the affected resource (with copy button)       |
| **Details**       | Additional metadata about the action                         |

## Pagination

The log supports infinite scrolling with a **Load More** button to view older entries. Entries are sorted newest first.


# Billing
URL: /learn/billing
import { ThemedImage } from "@/components/themed-image";

The Billing page is your central hub for managing credits, plans, and payment methods.

<ThemedImage alt="Billing" basePath="/learn/billing" />

## Credits

Displays your current credit balance. Credits are consumed as you make API requests through the gateway. Click **Top Up Credits** to add more credits to your account.

## Plan Management

View and manage your subscription:

* See your current plan (Free, Pro, or Enterprise)
* Billing cycle information
* Click **Manage Subscription** to upgrade, downgrade, or cancel

## Payment Methods

Manage your saved payment methods:

* Add a new credit card or payment method
* View existing payment methods
* Update billing information

## Auto Top-up Settings

Configure automatic credit top-ups so you never run out:

* **Enable/disable** auto top-up
* **Threshold** — The credit balance that triggers a top-up
* **Amount** — How many credits to add when the threshold is reached

This ensures uninterrupted service by automatically replenishing your credits when they run low.


# Dashboard
URL: /learn/dashboard
import { ThemedImage } from "@/components/themed-image";

The Dashboard is the first page you see after logging in. It provides a high-level overview of your project's LLM usage, costs, and performance at a glance.

<ThemedImage alt="Dashboard" basePath="/learn/dashboard" />

## Date Range

At the top of the page, you can toggle the date range for all dashboard metrics:

* **7 days** — Last 7 days of data (default)
* **30 days** — Last 30 days of data
* **Custom** — Pick a custom start and end date

## Stat Cards

The dashboard displays eight metric cards in two rows:

### Top Row

| Card                     | Description                                                              |
| ------------------------ | ------------------------------------------------------------------------ |
| **Organization Credits** | Your current available credit balance                                    |
| **Total Requests**       | Number of API requests in the selected period, with cache hit percentage |
| **Total Cost**           | Total inference cost for the period, including storage costs             |
| **Total Savings**        | Savings from discounts during the selected period                        |

### Bottom Row

| Card                     | Description                                                         |
| ------------------------ | ------------------------------------------------------------------- |
| **Input Tokens & Cost**  | Total prompt tokens sent and their associated cost                  |
| **Output Tokens & Cost** | Total completion tokens received and their associated cost          |
| **Cached Tokens & Cost** | Tokens served from cache (if caching is enabled) and the cost saved |
| **Most Used Model**      | The model with the highest request count, along with its provider   |

## Usage Overview Chart

Below the stat cards, a chart visualizes your usage over time. You can toggle between two views using the dropdown:

* **Costs** — Shows input, output, and cached input costs as a stacked area chart
* **Requests** — Shows request volume over time

The chart is filtered by the currently selected project.

## Quick Actions

A sidebar panel provides shortcuts to common tasks:

* **Manage API Keys** — Go to the API Keys page
* **Provider Keys** — Configure your own provider keys
* **View Activity** — See detailed request logs
* **Usage & Metrics** — Dive into usage analytics
* **Model Usage** — View per-model usage breakdown

## Header Actions

Two buttons in the top-right corner:

* **Create API Key** — Quickly create a new API key for your project
* **Top Up Credits** — Add credits to your organization balance


# Guardrails
URL: /learn/guardrails
import { Callout } from "fumadocs-ui/components/callout";
import { ThemedImage } from "@/components/themed-image";

The Guardrails page lets you configure content safety rules that automatically scan and filter API requests before they reach the LLM provider.

<ThemedImage alt="Guardrails" basePath="/learn/guardrails" />

<Callout type="info">
  Guardrails are available on the [**Enterprise
  plan**](https://llmgateway.io/enterprise). Owner or Admin role is required.
</Callout>

## Main Toggle

A global toggle at the top enables or disables all guardrails for your organization. Click **Save Changes** to apply.

## System Rules

Six built-in rules with individual enable/disable toggles:

| Rule                            | Description                                                          |
| ------------------------------- | -------------------------------------------------------------------- |
| **Prompt Injection Detection**  | Detects attempts to override or manipulate system instructions       |
| **Jailbreak Prevention**        | Identifies attempts to bypass safety measures                        |
| **PII Detection**               | Identifies personal information like emails, phone numbers, and SSNs |
| **Secrets Detection**           | Detects API keys, passwords, and credentials                         |
| **File Type Restrictions**      | Controls which file types can be uploaded                            |
| **Document Leakage Prevention** | Detects attempts to extract confidential documents                   |

Each rule has an action dropdown to configure the response:

* **Block** — Reject the request entirely
* **Redact** — Remove or mask sensitive content, then continue
* **Warn** — Log the violation but allow the request

## File Restrictions

Configure file upload limits:

* **Max file size** — Set the maximum file size in MB
* **Allowed file types** — Add or remove permitted MIME types

## Custom Rules

Create organization-specific rules by clicking **Add Rule**:

* **Blocked Terms** — Block specific words or phrases
* **Custom Regex** — Match patterns with regular expressions
* **Topic Restriction** — Restrict content related to specific topics

Each custom rule can be individually enabled/disabled or deleted.

Learn more about guardrails in the [Guardrails feature docs](/features/guardrails).


# Introduction
URL: /learn
The LLM Gateway dashboard gives you full control over your LLM API usage, costs, and configuration. This section walks you through every page in the dashboard so you can get the most out of the platform.

## Project Pages

These pages are scoped to a specific project within your organization:

* [**Dashboard**](/learn/dashboard) — Overview of your usage, costs, and quick actions
* [**Activity**](/learn/activity) — Detailed logs of every API request
* [**Model Usage**](/learn/model-usage) — Usage breakdown by model
* [**Usage & Metrics**](/learn/usage-metrics) — Requests, errors, cache rates, and cost trends
* [**API Keys**](/learn/api-keys) — Create and manage your API keys
* [**Preferences**](/learn/preferences) — Project-level settings like caching and mode

## Organization Pages

These pages apply to your entire organization:

* [**Provider Keys**](/learn/provider-keys) — Bring your own provider API keys
* [**Guardrails**](/learn/guardrails) — Content safety rules and filters
* [**Security Events**](/learn/security-events) — Monitor guardrail violations
* [**Billing**](/learn/billing) — Credits, plans, and payment methods
* [**Transactions**](/learn/transactions) — Payment and credit history
* [**Referrals**](/learn/referrals) — Earn credits by referring others
* [**Policies**](/learn/policies) — Data retention configuration
* [**Org Preferences**](/learn/org-preferences) — Organization name and billing email
* [**Team**](/learn/team) — Manage team members and roles
* [**Audit Logs**](/learn/audit-logs) — Complete history of organization actions

## Playground

Interactive tools for testing and experimenting with LLM models:

* [**Chat Playground**](/learn/playground) — Test models with an interactive chat interface
* [**Group Chat**](/learn/playground-group) — Compare responses from multiple models side by side
* [**Image Studio**](/learn/playground-image) — Generate and edit images using AI models


# Model Usage
URL: /learn/model-usage
import { ThemedImage } from "@/components/themed-image";

The Model Usage page shows how your API requests are distributed across different LLM models over time.

<ThemedImage alt="Model Usage" basePath="/learn/model-usage" />

## Filters

Two filters let you narrow down the data:

* **API Key** — Select a specific API key or view usage across all keys
* **Date range** — Choose a time period to analyze

## Usage Chart

The main chart displays a time-series breakdown of requests per model. Each model is represented by a different color, making it easy to see:

* Which models are used most frequently
* How usage patterns change over time
* Whether usage is concentrated on a single model or spread across many

This page is useful for understanding your model distribution and identifying opportunities to optimize costs by switching to more cost-effective models for certain workloads.


# Org Preferences
URL: /learn/org-preferences
import { ThemedImage } from "@/components/themed-image";

The Org Preferences page contains basic settings for your organization.

<ThemedImage alt="Org Preferences" basePath="/learn/org-preferences" />

## Organization Name

Update your organization's display name. This name appears throughout the dashboard and in billing communications.

## Billing Email

Set or update the email address used for billing-related communications, including receipts, invoices, and payment notifications.


# Group Chat
URL: /learn/playground-group
import { ThemedImage } from "@/components/themed-image";

The Group Chat page lets you send a single prompt to multiple models simultaneously and compare their responses side by side. This is useful for evaluating model quality, speed, and cost.

<ThemedImage alt="Group Chat" basePath="/learn/playground-group" />

## How It Works

1. Select two or more models from the model picker
2. Type your prompt in the input field
3. All selected models receive the same prompt at once
4. Responses stream in parallel, displayed in separate columns

## Use Cases

* **Model evaluation** — Compare output quality across providers
* **Cost optimization** — See which models give the best results for the price
* **Speed comparison** — Observe latency differences between models
* **Migration testing** — Verify that a new model produces equivalent results


# Image Studio
URL: /learn/playground-image
import { ThemedImage } from "@/components/themed-image";

The Image Studio lets you generate images using AI models through an intuitive interface. Select a model, describe what you want, and get results instantly.

<ThemedImage alt="Image Studio" basePath="/learn/playground-image" />

## Model Selection

Choose from supported image generation models in the dropdown. Each model has different capabilities, resolutions, and pricing.

## Generating Images

1. Select an image generation model
2. Type a description of the image you want
3. Click send to generate
4. Generated images appear in the conversation

## Image Count

You can generate 1, 2, or 4 images at once. Multiple images are displayed in a grid layout.

## Resolution Options

Available resolutions depend on the selected model. Common options include 1K, 2K, and 4K.


# Chat Playground
URL: /learn/playground
import { ThemedImage } from "@/components/themed-image";

The Chat Playground is a standalone app for testing LLM models through a conversational interface. You can select any supported model, adjust parameters, and see responses in real time.

<ThemedImage alt="Chat Playground" basePath="/learn/playground" />

## Model Selection

Use the dropdown at the top to pick a model and provider. The **Auto Route** option automatically selects the best provider based on availability and cost.

## Chat Interface

* Type your message in the input field at the bottom
* Click the send button or press Enter to submit
* Responses stream in real time
* Previous conversations appear in the sidebar

## Prompt Suggestions

When starting a new chat, category tabs help you pick a prompt:

* **Create** — Content generation prompts
* **Explore** — Research and analysis prompts
* **Code** — Programming and development prompts
* **Image gen** — Image generation prompts

## Sidebar

The left sidebar shows your chat history. Click **+ New Chat** to start a fresh conversation, or select a previous chat to continue it.

## Comparison Mode

Toggle **Comparison mode** in the top-right to send the same prompt to multiple models side by side. See the [Group Chat](/learn/playground-group) page for details.

## Image Studio

Click **Image Studio** in the sidebar to switch to the image generation interface. See the [Image Studio](/learn/playground-image) page for details.


# Policies
URL: /learn/policies
import { ThemedImage } from "@/components/themed-image";

The Policies page lets you configure organization-wide policies that govern how your data is handled.

<ThemedImage alt="Policies" basePath="/learn/policies" />

## Data Retention

Control how long your request logs and activity data are stored. The retention period depends on your plan:

| Plan           | Retention Period |
| -------------- | ---------------- |
| **Free**       | 3 days           |
| **Pro**        | 7 days           |
| **Enterprise** | 90 days          |

After the retention period expires, request logs and associated data are automatically deleted.

Learn more about data retention in the [Data Retention feature docs](/features/data-retention).


# Preferences
URL: /learn/preferences
import { ThemedImage } from "@/components/themed-image";

The Preferences page contains project-level settings that control how your project behaves.

<ThemedImage alt="Preferences" basePath="/learn/preferences" />

## Project Name

Update the display name for your project. This name appears in the sidebar and throughout the dashboard.

## Project Mode

Configure how your organization handles projects. This setting determines the routing and isolation behavior for API requests within the project.

## Caching

Enable or configure response caching for API requests. When enabled, identical requests will return cached responses instead of making new calls to the provider, saving both time and cost.

Learn more about caching in the [Caching feature docs](/features/caching).

## Danger Zone

The Danger Zone section contains irreversible actions:

* **Archive Project** — Permanently archive the project. This action cannot be undone. Archived projects stop processing requests and their API keys become inactive.


# Provider Keys
URL: /learn/provider-keys
import { Callout } from "fumadocs-ui/components/callout";
import { ThemedImage } from "@/components/themed-image";

The Provider Keys page lets you add your own API keys from LLM providers (OpenAI, Anthropic, Google, etc.) to route requests directly through your accounts without additional gateway fees.

<ThemedImage alt="Provider Keys" basePath="/learn/provider-keys" />

## Adding a Provider Key

Click **Add Provider Key** to configure a new key:

* **Provider** — Select which provider this key belongs to
* **Custom name** — An optional label to identify the key
* **API key** — Your provider's API key
* **Base URL** — Optional custom endpoint (useful for Azure OpenAI or custom deployments)

## Provider Keys List

Each configured key shows:

| Field           | Description                                        |
| --------------- | -------------------------------------------------- |
| **Provider**    | The LLM provider (e.g., OpenAI, Anthropic)         |
| **Custom name** | Your label for the key                             |
| **Status**      | Active, inactive, or deleted                       |
| **Base URL**    | Custom endpoint if configured                      |
| **Token**       | Masked key with only the last 4 characters visible |

## Actions

For each provider key:

* **Edit** — Update the key name, value, or base URL
* **Deactivate** — Temporarily disable the key without deleting it
* **Delete** — Permanently remove the key

<Callout type="info">
  When you use your own provider keys, requests are routed directly to the
  provider. You are only charged the provider's standard rates with no
  additional gateway markup.
</Callout>


# Referrals
URL: /learn/referrals
import { ThemedImage } from "@/components/themed-image";

The Referrals page lets you earn credits by inviting others to use LLM Gateway.

<ThemedImage alt="Referrals" basePath="/learn/referrals" />

## Eligibility

To unlock the referral program, your organization must have at least **$100 in total credit top-ups**. Before reaching this threshold, the page shows:

* A progress bar showing your progress toward $100
* The remaining amount needed to unlock
* An explanation of the 1% earnings model

## Referral Dashboard

Once eligible, the page shows:

### Your Referral Link

A unique shareable link tied to your organization. Click the copy button to copy it to your clipboard and share it with others.

### Your Stats

| Stat               | Description                                           |
| ------------------ | ----------------------------------------------------- |
| **Users Referred** | Total number of users who signed up through your link |
| **Total Earnings** | Total credit amount earned from referrals             |

### How It Works

1. **Share Your Link** — Send your referral link to others
2. **They Sign Up** — They create an LLM Gateway account using your link
3. **Earn Credits** — You earn 1% of their spending as credits

Credits are automatically added to your organization balance.


# Security Events
URL: /learn/security-events
import { Callout } from "fumadocs-ui/components/callout";
import { ThemedImage } from "@/components/themed-image";

The Security Events page shows all guardrail violations detected across your organization, helping you monitor content safety and policy enforcement.

<ThemedImage alt="Security Events" basePath="/learn/security-events" />

<Callout type="info">
  Security Events are available on the [**Enterprise
  plan**](https://llmgateway.io/enterprise). Owner or Admin role is required.
</Callout>

## Stats Cards

Four summary cards at the top:

| Card                 | Description                                   |
| -------------------- | --------------------------------------------- |
| **Total Violations** | All-time violation count                      |
| **Last 24 Hours**    | Violations in the past day                    |
| **Blocked**          | Number of requests that were blocked          |
| **Redacted**         | Number of requests where content was redacted |

## Filters

Narrow down the events list:

* **Action** — Filter by Blocked, Redacted, Warned, or All actions
* **Category** — Filter by Prompt Injection, Jailbreak, PII Detection, Secrets, Blocked Terms, Custom Regex, or Topic Restriction

## Violations List

Each violation entry shows:

| Field               | Description                                          |
| ------------------- | ---------------------------------------------------- |
| **Timestamp**       | When the violation occurred                          |
| **Rule name**       | Which guardrail rule was triggered                   |
| **Category**        | The type of violation (shown as a badge)             |
| **Action**          | What action was taken (Blocked, Redacted, or Warned) |
| **Matched pattern** | The content that triggered the rule                  |

The list supports pagination with a **Load More** button for viewing older events.


# Team
URL: /learn/team
import { ThemedImage } from "@/components/themed-image";

The Team page lets you invite team members, assign roles, and control access to your organization.

<ThemedImage alt="Team" basePath="/learn/team" />

## Adding Members

Click **Add Member** to invite someone by email. You'll need to:

1. Enter their email address
2. Select a role (Developer, Admin, or Owner)

Your plan includes up to **5 team seats**. The current count is displayed, and the Add button is disabled when all seats are used. Contact sales for additional seats.

## Team Members List

Each member shows:

| Field     | Description                                      |
| --------- | ------------------------------------------------ |
| **Name**  | The member's display name                        |
| **Email** | Their email address                              |
| **Role**  | Their current role (can be changed via dropdown) |

## Actions

* **Update role** — Change a member's role using the dropdown
* **Remove** — Remove a member from the organization (requires confirmation)

## Role Permissions

| Role          | Permissions                                                                                           |
| ------------- | ----------------------------------------------------------------------------------------------------- |
| **Owner**     | Full access to all settings, billing, team management, and all projects                               |
| **Admin**     | Can manage team members, projects, and API keys, but cannot access billing or delete the organization |
| **Developer** | View and use resources only. Cannot modify settings or manage team                                    |

Developers can also be given **restricted access** at the API key level, limiting which keys they can view and use.


# Transactions
URL: /learn/transactions
import { ThemedImage } from "@/components/themed-image";

The Transactions page shows a complete history of all financial transactions in your organization.

<ThemedImage alt="Transactions" basePath="/learn/transactions" />

## Transaction History

Each transaction entry includes:

| Field           | Description                              |
| --------------- | ---------------------------------------- |
| **Date**        | When the transaction occurred            |
| **Type**        | The transaction type (see below)         |
| **Credits**     | Number of credits added or deducted      |
| **Total Paid**  | The dollar amount charged                |
| **Status**      | Current state of the transaction         |
| **Description** | Additional details about the transaction |

## Transaction Types

| Type                    | Description                         |
| ----------------------- | ----------------------------------- |
| **Credit Top-up**       | Manual or automatic credit purchase |
| **Credit Refund**       | Credits refunded to your account    |
| **Subscription Start**  | New plan subscription started       |
| **Subscription Cancel** | Plan subscription canceled          |
| **Subscription End**    | Plan subscription period ended      |

## Status Badges

* **Completed** — Transaction processed successfully
* **Pending** — Transaction is being processed
* **Failed** — Transaction could not be completed


# Usage & Metrics
URL: /learn/usage-metrics
import { ThemedImage } from "@/components/themed-image";

The Usage & Metrics page provides comprehensive analytics through five tabs, giving you deep insight into your LLM API usage patterns.

<ThemedImage alt="Usage & Metrics" basePath="/learn/usage-metrics" />

## Filters

* **API Key** — Filter metrics by a specific API key or view all
* **Date range** — Select the time period (defaults to last 7 days)

## Tabs

### Requests

A time-series chart showing request volume over the selected period. Use this to identify traffic patterns, peak usage times, and growth trends.

### Models

A table showing your top-used models ranked by request count. For each model you can see:

* Total requests
* Token consumption
* Associated costs

This helps you understand which models drive the most usage and cost.

### Errors

A chart showing error rates over time. Track:

* Error frequency and trends
* Spikes that may indicate provider issues
* Overall reliability of your API calls

### Cache

A chart showing your cache hit rate over time. Monitor:

* How effectively caching is reducing redundant requests
* Cache hit vs. miss ratios
* The cost savings from cached responses

### Costs

A cost breakdown chart showing spending patterns. Analyze:

* Cost trends over time
* Cost distribution by provider or model
* Opportunities to reduce spending


# Migrate from LiteLLM
URL: /migrations/litellm
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";

Running your own LiteLLM proxy works—until it doesn't. Scaling, monitoring, and keeping it running becomes another job. LLM Gateway gives you the same unified API with built-in analytics, caching, and a dashboard—without the infrastructure overhead.

## Quick Migration

Both services use OpenAI-compatible endpoints, so migration is a two-line change:

```diff
- const baseURL = "http://localhost:4000/v1";  // LiteLLM proxy
+ const baseURL = "https://api.llmgateway.io/v1";

- const apiKey = process.env.LITELLM_API_KEY;
+ const apiKey = process.env.LLM_GATEWAY_API_KEY;
```

## Why Teams Switch to LLM Gateway

| What You Get             | LiteLLM (Self-Hosted) | LLM Gateway          |
| ------------------------ | --------------------- | -------------------- |
| OpenAI-compatible API    | Yes                   | Yes                  |
| Infrastructure to manage | Yes (you run it)      | No (we run it)       |
| Managed cloud option     | No                    | Yes                  |
| Analytics dashboard      | Basic                 | Per-request detail   |
| Response caching         | Manual setup          | Built-in, automatic  |
| Cost tracking            | Via callbacks         | Native, real-time    |
| Provider key management  | Config file           | Web UI with rotation |
| Uptime & scaling         | You handle it         | 99.9% SLA (Pro/Ent)  |

Still want to self-host? LLM Gateway is [open source under AGPLv3](https://llmgateway.io/blog/how-to-self-host-llm-gateway)—same features, your infrastructure.

For a detailed breakdown, see [LLM Gateway vs LiteLLM](https://llmgateway.io/compare/litellm).

## Migration Steps

<Steps>
  <Step>
    ### Get Your LLM Gateway API Key

    Sign up at [llmgateway.io/signup](https://llmgateway.io/signup) and create an API key from your dashboard.
  </Step>

  <Step>
    ### Map Your Models

    LLM Gateway supports two model ID formats:

    **Root Model IDs** (without provider prefix) - Uses smart routing to automatically select the best provider based on uptime, throughput, price, and latency:

    ```
    gpt-5.2
    claude-opus-4-5-20251101
    gemini-3-flash-preview
    ```

    **Provider-Prefixed Model IDs** - Routes to a specific provider with automatic failover if uptime drops below 90%:

    ```
    openai/gpt-5.2
    anthropic/claude-opus-4-5-20251101
    google-ai-studio/gemini-3-flash-preview
    ```

    This means many LiteLLM model names work directly with LLM Gateway:

    | LiteLLM Model                    | LLM Gateway Model                                                 |
    | -------------------------------- | ----------------------------------------------------------------- |
    | gpt-5.2                          | gpt-5.2 or openai/gpt-5.2                                         |
    | claude-opus-4-5-20251101         | claude-opus-4-5-20251101 or anthropic/claude-opus-4-5-20251101    |
    | gemini/gemini-3-flash-preview    | gemini-3-flash-preview or google-ai-studio/gemini-3-flash-preview |
    | bedrock/claude-opus-4-5-20251101 | claude-opus-4-5-20251101 or aws-bedrock/claude-opus-4-5-20251101  |

    For more details on routing behavior, see the [routing documentation](/features/routing).
  </Step>

  <Step>
    ### Update Your Code

    #### Python with OpenAI SDK

    ```python
    from openai import OpenAI

    # Before (LiteLLM proxy)
    client = OpenAI(
        base_url="http://localhost:4000/v1",
        api_key=os.environ["LITELLM_API_KEY"]
    )

    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello!"}]
    )

    # After (LLM Gateway) - model name can stay the same!
    client = OpenAI(
        base_url="https://api.llmgateway.io/v1",
        api_key=os.environ["LLM_GATEWAY_API_KEY"]
    )

    response = client.chat.completions.create(
        model="gpt-4",  # or "openai/gpt-4" to target a specific provider
        messages=[{"role": "user", "content": "Hello!"}]
    )
    ```

    #### Python with LiteLLM Library

    If you're using the LiteLLM library directly, you can point it to LLM Gateway:

    ```python
    import litellm

    # Before (direct LiteLLM)
    response = litellm.completion(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello!"}]
    )

    # After (via LLM Gateway) - same model name works
    response = litellm.completion(
        model="gpt-4",  # or "openai/gpt-4" to target a specific provider
        messages=[{"role": "user", "content": "Hello!"}],
        api_base="https://api.llmgateway.io/v1",
        api_key=os.environ["LLM_GATEWAY_API_KEY"]
    )
    ```

    #### TypeScript/JavaScript

    ```typescript
    import OpenAI from "openai";

    // Before (LiteLLM proxy)
    const client = new OpenAI({
    	baseURL: "http://localhost:4000/v1",
    	apiKey: process.env.LITELLM_API_KEY,
    });

    // After (LLM Gateway) - same model name works
    const client = new OpenAI({
    	baseURL: "https://api.llmgateway.io/v1",
    	apiKey: process.env.LLM_GATEWAY_API_KEY,
    });

    const completion = await client.chat.completions.create({
    	model: "gpt-4", // or "openai/gpt-4" to target a specific provider
    	messages: [{ role: "user", content: "Hello!" }],
    });
    ```

    #### cURL

    ```bash
    # Before (LiteLLM proxy)
    curl http://localhost:4000/v1/chat/completions \
      -H "Authorization: Bearer $LITELLM_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "gpt-4",
        "messages": [{"role": "user", "content": "Hello!"}]
      }'

    # After (LLM Gateway) - same model name works
    curl https://api.llmgateway.io/v1/chat/completions \
      -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "gpt-4",
        "messages": [{"role": "user", "content": "Hello!"}]
      }'
    # Use "openai/gpt-4" to target a specific provider
    ```
  </Step>

  <Step>
    ### Migrate Configuration

    #### LiteLLM Config (Before)

    ```yaml
    # litellm_config.yaml
    model_list:
      - model_name: gpt-4
        litellm_params:
          model: gpt-4
          api_key: sk-...
      - model_name: claude-3
        litellm_params:
          model: claude-3-sonnet-20240229
          api_key: sk-ant-...
    ```

    #### LLM Gateway (After)

    With LLM Gateway, you don't need a config file. Provider keys are managed in the web dashboard, or you can use the default LLM Gateway keys.

    If you want to use your own provider keys, configure them in the dashboard under Settings > Provider Keys.
  </Step>
</Steps>

## Streaming Support

LLM Gateway supports streaming identically to LiteLLM:

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.llmgateway.io/v1",
    api_key=os.environ["LLM_GATEWAY_API_KEY"]
)

stream = client.chat.completions.create(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
```

## Function/Tool Calling

LLM Gateway supports function calling:

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.llmgateway.io/v1",
    api_key=os.environ["LLM_GATEWAY_API_KEY"]
)

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)
```

## Removing LiteLLM Infrastructure

After verifying LLM Gateway works for your use case, you can decommission your LiteLLM proxy:

1. Update all clients to use LLM Gateway endpoints
2. Monitor the LLM Gateway dashboard for successful requests
3. Shut down your LiteLLM proxy server
4. Remove LiteLLM configuration files

## What Changes After Migration

* **No servers to babysit** — We handle scaling, uptime, and updates
* **Real-time cost visibility** — See what every request costs, broken down by model
* **Automatic caching** — Repeated requests hit cache, reducing your spend
* **Web-based management** — No more editing YAML files for config changes
* **New models immediately** — Access new releases within 48 hours, no deployment needed

## Self-Hosting LLM Gateway

If you prefer self-hosting like LiteLLM, LLM Gateway is available under AGPLv3:

```bash
git clone https://github.com/llmgateway/llmgateway
cd llmgateway
pnpm install
pnpm setup
pnpm dev
```

This gives you the same benefits as LiteLLM's self-hosted proxy with LLM Gateway's analytics and caching features.

## Full Comparison

Want to see a detailed breakdown of all features? Check out our [LLM Gateway vs LiteLLM comparison page](https://llmgateway.io/compare/litellm).

## Need Help?

* Browse available models at [llmgateway.io/models](https://llmgateway.io/models)
* Read the [API documentation](https://docs.llmgateway.io)
* Contact support at [contact@llmgateway.io](mailto:contact@llmgateway.io)


# Migrate from OpenRouter
URL: /migrations/openrouter
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";

LLM Gateway works just like OpenRouter—same API format, same model names—but with built-in analytics and the option to self-host. Migration takes two lines of code.

## Quick Migration

Change your base URL and API key:

```diff
- const baseURL = "https://openrouter.ai/api/v1";
- const apiKey = process.env.OPENROUTER_API_KEY;
+ const baseURL = "https://api.llmgateway.io/v1";
+ const apiKey = process.env.LLM_GATEWAY_API_KEY;
```

## Migration Steps

<Steps>
  <Step>
    ### Get Your LLM Gateway API Key

    Sign up at [llmgateway.io/signup](https://llmgateway.io/signup) and create an API key from your dashboard.
  </Step>

  <Step>
    ### Update Environment Variables

    ```bash
    # Remove OpenRouter credentials
    # OPENROUTER_API_KEY=sk-or-...

    # Add LLM Gateway credentials
    LLM_GATEWAY_API_KEY=llmgtwy_your_key_here
    ```
  </Step>

  <Step>
    ### Update Your Code

    #### Using fetch/axios

    ```typescript
    // Before (OpenRouter)
    const response = await fetch("https://openrouter.ai/api/v1/chat/completions", {
    	method: "POST",
    	headers: {
    		Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`,
    		"Content-Type": "application/json",
    	},
    	body: JSON.stringify({
    		model: "openai/gpt-5.2",
    		messages: [{ role: "user", content: "Hello!" }],
    	}),
    });

    // After (LLM Gateway)
    const response = await fetch("https://api.llmgateway.io/v1/chat/completions", {
    	method: "POST",
    	headers: {
    		Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`,
    		"Content-Type": "application/json",
    	},
    	body: JSON.stringify({
    		model: "gpt-5.2",
    		messages: [{ role: "user", content: "Hello!" }],
    	}),
    });
    ```

    #### Using OpenAI SDK

    ```typescript
    import OpenAI from "openai";

    // Before (OpenRouter)
    const client = new OpenAI({
    	baseURL: "https://openrouter.ai/api/v1",
    	apiKey: process.env.OPENROUTER_API_KEY,
    });

    // After (LLM Gateway)
    const client = new OpenAI({
    	baseURL: "https://api.llmgateway.io/v1",
    	apiKey: process.env.LLM_GATEWAY_API_KEY,
    });

    // Usage remains the same
    const completion = await client.chat.completions.create({
    	model: "anthropic/claude-3-5-sonnet-20241022",
    	messages: [{ role: "user", content: "Hello!" }],
    });
    ```

    #### Using Vercel AI SDK

    Both OpenRouter and LLM Gateway have native AI SDK providers, making migration straightforward:

    ```typescript
    import { generateText } from "ai";

    // Before (OpenRouter AI SDK Provider)
    import { createOpenRouter } from "@openrouter/ai-sdk-provider";

    const openrouter = createOpenRouter({
    	apiKey: process.env.OPENROUTER_API_KEY,
    });

    const { text } = await generateText({
    	model: openrouter("gpt-5.2"),
    	prompt: "Hello!",
    });

    // After (LLM Gateway AI SDK Provider)
    import { createLLMGateway } from "@llmgateway/ai-sdk-provider";

    const llmgateway = createLLMGateway({
    	apiKey: process.env.LLMGATEWAY_API_KEY,
    });

    const { text } = await generateText({
    	model: llmgateway("gpt-5.2"),
    	prompt: "Hello!",
    });
    ```
  </Step>
</Steps>

## Model Name Mapping

Most model names are compatible, but here are some common mappings:

| OpenRouter Model                 | LLM Gateway Model                                                 |
| -------------------------------- | ----------------------------------------------------------------- |
| openai/gpt-5.2                   | gpt-5.2 or openai/gpt-5.2                                         |
| gemini/gemini-3-flash-preview    | gemini-3-flash-preview or google-ai-studio/gemini-3-flash-preview |
| bedrock/claude-opus-4-5-20251101 | claude-opus-4-5-20251101 or aws-bedrock/claude-opus-4-5-20251101  |

Check the [models page](https://llmgateway.io/models) for the full list of available models.

## Streaming Support

LLM Gateway supports streaming responses identically to OpenRouter:

```typescript
const stream = await client.chat.completions.create({
	model: "anthropic/claude-3-5-sonnet-20241022",
	messages: [{ role: "user", content: "Write a story" }],
	stream: true,
});

for await (const chunk of stream) {
	process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
```

## Full Comparison

Want to see a detailed breakdown of all features? Check out our [LLM Gateway vs OpenRouter comparison page](https://llmgateway.io/compare/open-router).

## Need Help?

* Browse available models at [llmgateway.io/models](https://llmgateway.io/models)
* Read the [API documentation](https://docs.llmgateway.io)
* Contact support at [contact@llmgateway.io](mailto:contact@llmgateway.io)


# Migrate from Vercel AI Gateway
URL: /migrations/vercel-ai-gateway
import { Step, Steps } from "fumadocs-ui/components/steps";
import { Callout } from "fumadocs-ui/components/callout";

## Quick Migration

Swap your provider imports—your AI SDK code stays the same:

```diff
- import { openai } from "@ai-sdk/openai";
- import { anthropic } from "@ai-sdk/anthropic";
+ import { generateText } from "ai";
+ import { createLLMGateway } from "@llmgateway/ai-sdk-provider";

+ const llmgateway = createLLMGateway({
+   apiKey: process.env.LLM_GATEWAY_API_KEY
+ });

const { text } = await generateText({
-   model: openai("gpt-5.2"),
+   model: llmgateway("gpt-5.2"),
  prompt: "Hello!"
});
```

The key difference: one provider, one API key, all models—with caching and analytics built in.

## Migration Steps

<Steps>
  <Step>
    ### Get Your LLM Gateway API Key

    Sign up at [llmgateway.io/signup](https://llmgateway.io/signup) and create an API key from your dashboard.
  </Step>

  <Step>
    ### Install the LLM Gateway AI SDK Provider

    Install the native LLM Gateway provider for the Vercel AI SDK:

    ```bash
    pnpm add @llmgateway/ai-sdk-provider
    ```

    This package provides full compatibility with the Vercel AI SDK and supports all LLM Gateway features.
  </Step>

  <Step>
    ### Update Your Code

    #### Basic Text Generation

    ```typescript
    // Before (Vercel AI Gateway with native providers)
    import { openai } from "@ai-sdk/openai";
    import { anthropic } from "@ai-sdk/anthropic";
    import { generateText } from "ai";

    const { text: openaiText } = await generateText({
    	model: openai("gpt-4o"),
    	prompt: "Hello!",
    });

    const { text: claudeText } = await generateText({
    	model: anthropic("claude-3-5-sonnet-20241022"),
    	prompt: "Hello!",
    });

    // After (LLM Gateway - single provider for all models)
    import { createLLMGateway } from "@llmgateway/ai-sdk-provider";
    import { generateText } from "ai";

    const llmgateway = createLLMGateway({
    	apiKey: process.env.LLM_GATEWAY_API_KEY,
    });

    const { text: openaiText } = await generateText({
    	model: llmgateway("openai/gpt-4o"),
    	prompt: "Hello!",
    });

    const { text: claudeText } = await generateText({
    	model: llmgateway("anthropic/claude-3-5-sonnet-20241022"),
    	prompt: "Hello!",
    });
    ```

    #### Streaming Responses

    ```typescript
    import { createLLMGateway } from "@llmgateway/ai-sdk-provider";
    import { streamText } from "ai";

    const llmgateway = createLLMGateway({
    	apiKey: process.env.LLM_GATEWAY_API_KEY,
    });

    const { textStream } = await streamText({
    	model: llmgateway("anthropic/claude-3-5-sonnet-20241022"),
    	prompt: "Write a poem about coding",
    });

    for await (const text of textStream) {
    	process.stdout.write(text);
    }
    ```

    #### Using in Next.js API Routes

    ```typescript
    // app/api/chat/route.ts
    import { createLLMGateway } from "@llmgateway/ai-sdk-provider";
    import { streamText } from "ai";

    const llmgateway = createLLMGateway({
    	apiKey: process.env.LLM_GATEWAY_API_KEY,
    });

    export async function POST(req: Request) {
    	const { messages } = await req.json();

    	const result = await streamText({
    		model: llmgateway("openai/gpt-4o"),
    		messages,
    	});

    	return result.toDataStreamResponse();
    }
    ```

    #### Alternative: Using OpenAI SDK Adapter

    If you prefer not to install a new package, you can use `@ai-sdk/openai` with a custom base URL:

    ```typescript
    import { createOpenAI } from "@ai-sdk/openai";
    import { generateText } from "ai";

    const llmgateway = createOpenAI({
    	baseURL: "https://api.llmgateway.io/v1",
    	apiKey: process.env.LLM_GATEWAY_API_KEY,
    });

    const { text } = await generateText({
    	model: llmgateway("openai/gpt-4o"),
    	prompt: "Hello!",
    });
    ```
  </Step>

  <Step>
    ### Update Environment Variables

    ```bash
    # Remove individual provider keys (optional - can keep as backup)
    # OPENAI_API_KEY=sk-...
    # ANTHROPIC_API_KEY=sk-ant-...

    # Add LLM Gateway key
    export LLM_GATEWAY_API_KEY=llmgtwy_your_key_here
    ```
  </Step>
</Steps>

## Model Name Format

LLM Gateway supports two model ID formats:

**Root Model IDs** (without provider prefix) - Uses smart routing to automatically select the best provider based on uptime, throughput, price, and latency:

```
gpt-4o
claude-3-5-sonnet-20241022
gemini-1.5-pro
```

**Provider-Prefixed Model IDs** - Routes to a specific provider with automatic failover if uptime drops below 90%:

```
openai/gpt-4o
anthropic/claude-3-5-sonnet-20241022
google-ai-studio/gemini-1.5-pro
```

For more details on routing behavior, see the [routing documentation](/features/routing).

### Model Mapping Examples

| Vercel AI SDK                             | LLM Gateway                                                                                        |
| ----------------------------------------- | -------------------------------------------------------------------------------------------------- |
| `openai("gpt-4o")`                        | `llmgateway("gpt-4o")` or `llmgateway("openai/gpt-4o")`                                            |
| `anthropic("claude-3-5-sonnet-20241022")` | `llmgateway("claude-3-5-sonnet-20241022")` or `llmgateway("anthropic/claude-3-5-sonnet-20241022")` |
| `google("gemini-1.5-pro")`                | `llmgateway("gemini-1.5-pro")` or `llmgateway("google-ai-studio/gemini-1.5-pro")`                  |

Check the [models page](https://llmgateway.io/models) for the full list of available models.

## Tool Calling

LLM Gateway supports tool calling through the AI SDK:

```typescript
import { createLLMGateway } from "@llmgateway/ai-sdk-provider";
import { generateText, tool } from "ai";
import { z } from "zod";

const llmgateway = createLLMGateway({
	apiKey: process.env.LLM_GATEWAY_API_KEY,
});

const { text, toolResults } = await generateText({
	model: llmgateway("openai/gpt-4o"),
	tools: {
		weather: tool({
			description: "Get the weather for a location",
			parameters: z.object({
				location: z.string(),
			}),
			execute: async ({ location }) => {
				return { temperature: 72, condition: "sunny" };
			},
		}),
	},
	prompt: "What's the weather in San Francisco?",
});
```

## Self-Hosting LLM Gateway

If you prefer self-hosting, LLM Gateway is available under AGPLv3:

```bash
git clone https://github.com/llmgateway/llmgateway
cd llmgateway
pnpm install
pnpm setup
pnpm dev
```

This gives you the same managed experience with full control over your infrastructure.

## Need Help?

* Browse available models at [llmgateway.io/models](https://llmgateway.io/models)
* Read the [API documentation](https://docs.llmgateway.io)
* Contact support at [contact@llmgateway.io](mailto:contact@llmgateway.io)


# Rate Limits
URL: /resources/rate-limits
import { Callout } from "fumadocs-ui/components/callout";

# Rate Limits

LLMGateway implements rate limits to ensure fair usage and optimal performance for all users. The rate limits differ based on your account status and the type of models you're using.

## Free Models

Free models (models with zero input and output pricing) have rate limits that depend on your account's credit status:

### Base Rate Limits

For organizations with **zero credits**:

* **5 requests per 10 minutes**
* Applies to all free model requests
* Resets every 10 minutes

### Elevated Rate Limits

For organizations that have **purchased at least some credits**:

* **20 requests per minute**
* Applies to all free model requests
* Resets every minute

<Callout type="info">
  When using free models with elevated limits, your credits will **not** be
  deducted. The elevated rate limits are simply a benefit for users who have
  added credits to their account.
</Callout>

## Paid Models

**Paid AI models are not currently rate limited.** You can make as many requests as needed to paid models, subject only to your account's credit balance and any provider-specific limits.

## Rate Limit Headers

All API responses include rate limit information in the headers:

```http
X-RateLimit-Limit: 20
X-RateLimit-Remaining: 19
X-RateLimit-Reset: 1640995200
```

* `X-RateLimit-Limit`: Maximum number of requests allowed in the current window
* `X-RateLimit-Remaining`: Number of requests remaining in the current window
* `X-RateLimit-Reset`: Unix timestamp when the rate limit window resets

## Rate Limit Exceeded

When you exceed your rate limit, you'll receive a `429 Too Many Requests` response:

```json
{
	"error": {
		"message": "Rate limit exceeded. Try again later.",
		"type": "rate_limit_error",
		"code": "rate_limit_exceeded"
	}
}
```

## Best Practices

### Upgrading Your Limits

To unlock elevated rate limits for free models:

1. Add credits to your account through the dashboard
2. Your rate limits will automatically increase to 20 requests per minute
3. Free model usage will still not deduct from your credits

### Handling Rate Limits

* Implement exponential backoff when you receive 429 responses
* Monitor the `X-RateLimit-Remaining` header to avoid hitting limits
* Consider using paid models for high-volume applications

### Cost Optimization

* Use free models for development and testing
* Switch to paid models for production workloads requiring higher throughput
* Monitor your usage patterns through the dashboard

<Callout type="success">
  Adding even a small amount of credits to your account (e.g., $5) will
  immediately upgrade your free model rate limits from 5 requests per 10 minutes
  to 20 requests per minute.
</Callout>