# Introduction URL: / import { FeatureCards } from "@/components/feature-cards"; import { AIToolingCards } from "@/components/ai-tooling-cards"; LLM Gateway is an open-source API gateway that sits between your applications and LLM providers like OpenAI, Anthropic, Google AI Studio, and more. It provides a unified, OpenAI-compatible API interface with built-in cost tracking, caching, and intelligent routing. ## Features ## AI Tooling LLM Gateway is built to work seamlessly with AI agents and development tools. ## Next Steps * [**Quickstart**](/quick-start) β€” Get up and running in minutes * [**Overview**](/overview) β€” Learn more about what LLM Gateway offers * [**Self-Host**](/self-host) β€” Deploy on your own infrastructure # Overview URL: /overview # LLM Gateway LLM Gateway is an open-source API gateway for Large Language Models (LLMs). It acts as a middleware between your applications and various LLM providers, allowing you to: * Route requests to multiple LLM providers (OpenAI, Anthropic, Google AI Studio, and others) * Manage API keys for different providers in one place * Track token usage and costs across all your LLM interactions * Analyze performance metrics to optimize your LLM usage ## Analyzing Your LLM Requests LLM Gateway provides detailed insights into your LLM usage: * **Usage Metrics**: Track the number of requests, tokens used, and response times * **Cost Analysis**: Monitor spending across different models and providers * **Performance Tracking**: Identify patterns and optimize your prompts based on actual usage data * **Breakdown by Model**: Compare different models' performance and cost-effectiveness All this data is automatically collected and presented in an intuitive dashboard, helping you make informed decisions about your LLM strategy. ## Getting Started Using LLM Gateway is simple. Just swap out your current LLM provider URL with the LLM Gateway API endpoint: ```bash curl -X POST https://api.llmgateway.io/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -d '{ "model": "gpt-4o", "messages": [ {"role": "user", "content": "Hello, how are you?"} ] }' ``` LLM Gateway maintains compatibility with the OpenAI API format, making migration seamless. ## Hosted vs. Self-Hosted You can use LLM Gateway in two ways: * **Hosted Version**: For immediate use without setup, visit [llmgateway.io](https://llmgateway.io) to create an account and get an API key. * **Self-Hosted**: Deploy LLM Gateway on your own infrastructure for complete control over your data and configuration. The self-hosted version offers additional customization options and ensures your LLM traffic never leaves your infrastructure if desired. # Quickstart URL: /quick-start import { Accordion, Accordions } from "fumadocs-ui/components/accordion"; import { Tabs, Tab } from "fumadocs-ui/components/tabs"; import { DynamicCodeBlock } from "fumadocs-ui/components/dynamic-codeblock"; # πŸš€ Quickstart Welcome to **LLMΒ Gateway**β€”a single drop‑in endpoint that lets you call today’s best large‑language models while keeping **your existing code** and development workflow intact. > **TL;DR**Β β€” Point your HTTP requests to `https://api.llmgateway.io/v1/…`, supply your `LLM_GATEWAY_API_KEY`, and you’re done. *** ## 1Β Β·Β Get an API key 1. Sign in to the dashboard. 2. Create a new Project β†’ *Copy the key*. 3. Export it in your shell (or a `.env` file): ```bash export LLM_GATEWAY_API_KEY="llmgtwy_XXXXXXXXXXXXXXXX" ``` *** ## 2 Β· Pick your language { setLoading(true); try { const res = await fetch('https://api.llmgateway.io/v1/chat/completions', { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': \`Bearer \${process.env.REACT_APP_LLM_GATEWAY_API_KEY}\` }, body: JSON.stringify({ model: 'gpt-4o', messages: [ { role: 'user', content: 'Hello, how are you?' } ] }) }); if (!res.ok) { throw new Error(\`HTTP error! status: \${res.status}\`); } const data = await res.json(); setResponse(data.choices[0].message.content); } catch (error) { console.error('Error:', error); } finally { setLoading(false); } }; return (
{response &&

{response}

}
); } export default ChatComponent; `} />
response = HttpClient.newHttpClient() .send(request, HttpResponse.BodyHandlers.ofString()); System.out.println(response.body());`} /> Result<(), Box> { let client = Client::new(); let api_key = env::var("LLM_GATEWAY_API_KEY")?; let response = client .post("https://api.llmgateway.io/v1/chat/completions") .header("Content-Type", "application/json") .header("Authorization", format!("Bearer {}", api_key)) .json(&json!({ "model": "gpt-4o", "messages": [ {"role": "user", "content": "Hello, how are you?"} ] })) .send() .await?; let result: serde_json::Value = response.json().await?; println!("{}", result["choices"][0]["message"]["content"]); Ok(()) }`} /> 'gpt-4o', 'messages' => [ ['role' => 'user', 'content' => 'Hello, how are you?'] ] ]; $options = [ 'http' => [ 'header' => [ 'Content-Type: application/json', 'Authorization: Bearer ' . $apiKey ], 'method' => 'POST', 'content' => json_encode($data) ] ]; $context = stream_context_create($options); $response = file_get_contents( 'https://api.llmgateway.io/v1/chat/completions', false, $context ); if ($response === FALSE) { throw new Exception('Request failed'); } $result = json_decode($response, true); echo $result['choices'][0]['message']['content']; ?>`} />
*** ## 3 Β· SDK integrations ```ts title="ai-sdk.ts" import { llmgateway } from "@llmgateway/ai-sdk-provider"; import { generateText } from "ai"; const { text } = await generateText({ model: llmgateway("gpt-4o"), prompt: "Write a vegetarian lasagna recipe for 4 people.", }); ``` ```ts title="vercel-ai-sdk.ts" import { createOpenAI } from "@ai-sdk/openai"; const llmgateway = createOpenAI({ baseURL: "https://api.llmgateway.io/v1", apiKey: process.env.LLM_GATEWAY_API_KEY!, }); const completion = await llmgateway.chat({ model: "gpt-4o", messages: [{ role: "user", content: "Hello, how are you?" }], }); console.log(completion.choices[0].message.content); ``` ```ts title="openai-sdk.ts" import OpenAI from "openai"; const openai = new OpenAI({ baseURL: "https://api.llmgateway.io/v1", apiKey: process.env.LLM_GATEWAY_API_KEY, }); const completion = await openai.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello, how are you?" }], }); console.log(completion.choices[0].message.content); ``` *** ## 4Β Β·Β Going further * **Streaming**: pass `stream: true` to any requestβ€”Gateway will proxy the event stream unchanged. * **Monitoring**: Every call appears in the dashboard with latency, cost & provider breakdown. *** ## 5Β Β·Β FAQ See the [Models page](https://llmgateway.io/models).

Unlike OpenRouter, we offer:

  • Full self-hosting capabilities, giving you complete control over your infrastructure
  • Enhanced analytics with deeper insights into your model usage and performance
  • No fees when using your own provider keys, maximizing cost efficiency
  • Greater flexibility and customization options for enterprise deployments
Our pricing structure is designed to be flexible and cost-effective: See the [Pricing section](https://llmgateway.io#pricing).
*** ## 6Β Β·Β Next steps * Read [Self host docs](/self-host) guide. * Drop into our [GitHub](https://github.com/theopenco/llmgateway) for help or feature requests. Happy building! ✨ # Self Host LLMGateway URL: /self-host # Self Host LLMGateway LLMGateway is a self-hostable platform that provides a unified API gateway for multiple LLM providers. This guide offers two simple options to get started. ## Prerequisites * Latest Docker * API keys for the LLM providers you want to use (OpenAI, Anthropic, etc.) ## Option 1: Unified Docker Image (Simplest) This option uses a single Docker container that includes all services (UI, API, Gateway, Database, Redis). ```bash # Run the container docker run -d \ --name llmgateway \ --restart unless-stopped \ -p 3002:3002 \ -p 3003:3003 \ -p 3005:3005 \ -p 3006:3006 \ -p 4001:4001 \ -p 4002:4002 \ -v ~/llmgateway_data:/var/lib/postgresql/data \ -e AUTH_SECRET=your-secret-key-here \ ghcr.io/theopenco/llmgateway-unified:latest ``` Note: it is recommended to use the latest version tag from here instead of `latest`: [https://github.com/theopenco/llmgateway/releases](https://github.com/theopenco/llmgateway/releases) ### Using Docker Compose (Alternative for unified image) ```bash # Download the compose file curl -O https://raw.githubusercontent.com/theopenco/llmgateway/main/infra/docker-compose.unified.yml curl -O https://raw.githubusercontent.com/theopenco/llmgateway/main/.env.example # Configure environment cp .env.example .env # Edit .env with your configuration # Start the service docker compose -f docker-compose.unified.yml up -d ``` Note: it is recommended to replace the `latest` version tag in the image with the latest version from here: [https://github.com/theopenco/llmgateway/releases](https://github.com/theopenco/llmgateway/releases) ## Option 2: Separate Services with Docker Compose This option uses separate containers for each service, offering more flexibility. ```bash # Clone the repository git clone https://github.com/theopenco/llmgateway.git cd llmgateway # Configure environment cp .env.example .env # Edit .env with your configuration # Start the services docker compose -f infra/docker-compose.split.yml up -d ``` Note: it is recommended to replace the `latest` version tag in all images in the compose file with the latest version from here: [https://github.com/theopenco/llmgateway/releases](https://github.com/theopenco/llmgateway/releases) ## Accessing Your LLMGateway After starting either option, you can access: * **Web Interface**: [http://localhost:3002](http://localhost:3002) * **Documentation**: [http://localhost:3005](http://localhost:3005) * **API Endpoint**: [http://localhost:4002](http://localhost:4002) * **Gateway Endpoint**: [http://localhost:4001](http://localhost:4001) ## Required Configuration At minimum, you need to set these environment variables: ```bash # Database (change the password!) POSTGRES_PASSWORD=your_secure_password_here # Authentication AUTH_SECRET=your-secret-key-here # LLM Provider API Keys (add the ones you need) LLM_OPENAI_API_KEY=sk-... LLM_ANTHROPIC_API_KEY=sk-ant-... ``` ## Basic Management Commands ### For Unified Docker (Option 1) ```bash # View logs docker logs llmgateway # Restart container docker restart llmgateway # Stop container docker stop llmgateway ``` ### For Docker Compose (Option 2) ```bash # View logs docker compose -f infra/docker-compose.split.yml logs -f # Restart services docker compose -f infra/docker-compose.split.yml restart # Stop services docker compose -f infra/docker-compose.split.yml down ``` ## Build locally To build locally, you can use the \*.local.yml compose file in the `infra` directory, which will build the images from the source code. ## All provider API keys You can set any of the following API keys: ```text LLM_OPENAI_API_KEY= LLM_ANTHROPIC_API_KEY= ``` ## Multiple API Keys and Load Balancing LLMGateway supports multiple API keys per provider for load balancing and increased availability. Simply provide comma-separated values for your API keys: ```bash # Multiple OpenAI keys for load balancing LLM_OPENAI_API_KEY=sk-key1,sk-key2,sk-key3 # Multiple Anthropic keys LLM_ANTHROPIC_API_KEY=sk-ant-key1,sk-ant-key2 ``` ### Health-Aware Routing The gateway automatically tracks the health of each API key and routes requests to healthy keys. If a key experiences consecutive errors, it will be temporarily skipped. Keys that return authentication errors (401/403) are permanently blacklisted until restart. ### Related Configuration Values For providers that require additional configuration (like Google Vertex), you can specify multiple values that correspond to each API key. The gateway will always use the matching index: ```bash # Multiple Google Vertex configurations LLM_GOOGLE_VERTEX_API_KEY=key1,key2,key3 LLM_GOOGLE_CLOUD_PROJECT=project-a,project-b,project-c LLM_GOOGLE_VERTEX_REGION=us-central1,europe-west1,asia-east1 ``` When the gateway selects `key2`, it will automatically use `project-b` and `europe-west1`. If you have fewer configuration values than keys, the last value will be reused for remaining keys. ## Next Steps Once your LLMGateway is running: 1. **Open the web interface** at [http://localhost:3002](http://localhost:3002) 2. **Create your first organization** and project 3. **Generate API keys** for your applications 4. **Test the gateway** by making API calls to [http://localhost:4001](http://localhost:4001) # Health check URL: /health {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} Health check endpoint. # Chat Completions URL: /v1_chat_completions {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} Create a completion for the chat conversation # Anthropic Messages URL: /v1_messages {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} Create a message using Anthropic's API format # Models URL: /v1_models {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} List all available models # Agent Skills URL: /guides/agent-skills import { Step, Steps } from "fumadocs-ui/components/steps"; import { Callout } from "fumadocs-ui/components/callout"; **Agent Skills** are structured guidelines for AI coding agents, optimized for use with LLM Gateway and the AI SDK. They provide best practices and reusable instructions that help AI agents generate higher-quality code. ## What Are Agent Skills? Agent Skills are packaged sets of rules and guidelines that teach AI coding agents how to implement specific features correctly. Each skill covers: * API integration patterns * Frontend rendering best practices * Error handling strategies * Performance optimization techniques ## Available Skills ### Image Generation The Image Generation skill teaches AI agents how to properly implement image generation features: * **API Integration** β€” correctly calling image generation APIs * **Frontend Rendering** β€” displaying generated images efficiently * **Error Handling** β€” graceful degradation and retry logic * **Performance** β€” caching, lazy loading, and optimization ## Installation ### Prerequisites Ensure you have Node.js 18+ and pnpm 9+ installed: ```bash node --version # v18.0.0 or higher pnpm --version # 9.0.0 or higher ``` ### Clone the Repository ```bash git clone https://github.com/theopenco/agent-skills.git cd agent-skills ``` ### Install Dependencies ```bash pnpm install ``` ### Build Skills Build all skills to generate the documentation: ```bash pnpm build:all ``` Or build a specific skill: ```bash pnpm build ``` ## Using Skills in Your Project After building, each skill generates an `AGENTS.md` file that can be used with AI coding agents like Claude, Cursor, or Copilot. ### With Claude Code Add the generated `AGENTS.md` content to your project's `CLAUDE.md` file: ```bash cat skills/image-generation/AGENTS.md >> CLAUDE.md ``` ### With Cursor Add the skill content to your `.cursorrules` file: ```bash cat skills/image-generation/AGENTS.md >> .cursorrules ``` ### With Other AI Agents Most AI coding tools support custom instructions. Copy the skill content into your tool's configuration. ## Project Structure ``` agent-skills/ β”œβ”€β”€ packages/ β”‚ └── skills-build/ # Build tooling β”œβ”€β”€ skills/ β”‚ └── image-generation/ # Individual skill β”‚ β”œβ”€β”€ rules/ # Rule files β”‚ β”œβ”€β”€ AGENTS.md # Generated documentation β”‚ └── metadata.json # Skill metadata └── package.json ``` ## Contributing ### Adding New Rules ### Fork and Clone Fork the repository and create a feature branch: ```bash git checkout -b feat/new-rule ``` ### Create a Rule File Rules follow a standardized template with YAML frontmatter containing `title`, `impact` (high/medium/low), and `tags`. The body includes sections for Context, Incorrect examples, and Correct examples with TypeScript code blocks. See existing rules in `skills/image-generation/rules/` for reference. ### Validate and Build ```bash pnpm validate pnpm build:all ``` ### Submit a Pull Request Push your changes and open a PR. ### Impact Levels When creating rules, use these impact levels: * **high** β€” Critical for correctness or security * **medium** β€” Important for quality and maintainability * **low** β€” Nice-to-have improvements ## Development Commands | Command | Description | | ---------------- | --------------------------- | | `pnpm install` | Install dependencies | | `pnpm build:all` | Build all skills | | `pnpm build` | Build a specific skill | | `pnpm validate` | Validate rule files | | `pnpm dev` | Development mode with watch | ## More Resources * [GitHub Repository](https://github.com/theopenco/agent-skills) β€” Source code and contributions * [LLM Gateway CLI](/guides/cli) β€” Project scaffolding tool * [Templates](https://llmgateway.io/templates) β€” Production-ready starter projects Want to contribute a new skill or rule? Check out the [contribution guidelines](https://github.com/theopenco/agent-skills#contributing) on GitHub. # Autohand Integration URL: /guides/autohand import { Step, Steps } from "fumadocs-ui/components/steps"; import { Callout } from "fumadocs-ui/components/callout"; Autohand is an autonomous AI coding agent that works in your terminal, IDE, and Slack. With LLM Gateway, you can route all Autohand requests through a single gatewayβ€”use any of 180+ models from 60+ providers, with full cost tracking and smart routing. ## Setup ### Sign Up for LLM Gateway [Sign up free](https://llmgateway.io/signup) β€” no credit card required. Copy your API key from the dashboard. ### Set Environment Variables Configure Autohand to use LLM Gateway: ```bash export OPENAI_BASE_URL=https://api.llmgateway.io/v1 export OPENAI_API_KEY=llmgtwy_your_api_key_here ``` ### Run Autohand ```bash autohand ``` All requests will now be routed through LLM Gateway. ## Why Use LLM Gateway with Autohand * **180+ models** β€” GPT-5, Claude Opus, Gemini, Llama, and more from 60+ providers * **Smart routing** β€” Automatically selects the best provider based on uptime, throughput, price, and latency * **Cost tracking** β€” Monitor exactly how much each autonomous session costs * **Single bill** β€” No need to manage multiple API provider accounts * **Response caching** β€” Repeated requests hit cache automatically * **Automatic failover** β€” If one provider is down, requests route to another ## Configuration File You can also configure LLM Gateway in Autohand's config file: ```json { "provider": { "llmgateway": { "baseUrl": "https://api.llmgateway.io/v1", "apiKey": "llmgtwy_your_api_key_here" } }, "model": "gpt-5" } ``` ## Choosing Models You can use any model from the [models page](https://llmgateway.io/models). | Model | Best For | | ------------------- | ------------------------------------------- | | `gpt-5` | Latest OpenAI flagship, highest quality | | `claude-opus-4-6` | Anthropic's most capable model | | `claude-sonnet-4-6` | Fast reasoning with extended thinking | | `gemini-2.5-pro` | Google's latest flagship, 1M context window | | `o3` | Advanced reasoning tasks | | `gpt-5-mini` | Cost-effective, quick responses | | `gemini-2.5-flash` | Fast responses, good for high-volume | | `deepseek-v3.1` | Open-source with vision and tools | ## Autohand Features with LLM Gateway ### Terminal (CLI) Autohand CLI works seamlessly with LLM Gateway. Set the environment variables and use all Autohand commands as normalβ€”multi-file editing, agentic search, and autonomous code generation all work out of the box. ### IDE Integration Autohand's VS Code and Zed extensions respect the same environment variables. Set them in your shell profile and the IDE integration will automatically route through LLM Gateway. ### Slack Integration When using Autohand through Slack, configure the LLM Gateway base URL in your Autohand server settings to route all Slack-triggered coding tasks through the gateway. ## Monitoring Usage Once configured, all Autohand requests appear in your LLM Gateway dashboard: * **Request logs** β€” See every prompt and response * **Cost breakdown** β€” Track spending by model and time period * **Usage analytics** β€” Understand your AI usage patterns View all available models on the [models page](https://llmgateway.io/models). Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # Claude Code Integration URL: /guides/claude-code import { Step, Steps } from "fumadocs-ui/components/steps"; import { Callout } from "fumadocs-ui/components/callout"; Claude Code is locked to Anthropic's API by default. With LLM Gateway, you can point it at any modelβ€”GPT-5, Gemini, Llama, or 180+ othersβ€”while keeping the same Anthropic API format Claude Code expects. Three environment variables. No code changes. Full cost tracking in your dashboard. ## Setup ### Sign Up for LLM Gateway [Sign up free](https://llmgateway.io/signup) β€” no credit card required. Copy your API key from the dashboard. ### Set Environment Variables Configure Claude Code to use LLM Gateway: ```bash export ANTHROPIC_BASE_URL=https://api.llmgateway.io export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here # optional: specify a model, otherwise it uses the default Claude model export ANTHROPIC_MODEL=gpt-5 # or any model from our catalog ``` ### Run Claude Code ```bash claude ``` All requests will now be routed through LLM Gateway. ## Why This Works LLM Gateway's `/v1/messages` endpoint speaks Anthropic's API format natively. We handle the translation to each provider behind the scenes. This means: * **Use any model** β€” GPT-5, Gemini, Llama, or Claude itself * **Keep your workflow** β€” Claude Code doesn't know the difference * **Track costs** β€” Every request appears in your LLM Gateway dashboard * **Automatic caching** β€” Repeated requests hit cache, saving money ## Choosing Models You can use any model from the [models page](https://llmgateway.io/models). ### Use OpenAI's Latest Models ```bash # Use the latest GPT model export ANTHROPIC_MODEL=gpt-5 # Use a cost-effective alternative export ANTHROPIC_MODEL=gpt-5-mini ``` ### Use Google's Gemini ```bash export ANTHROPIC_MODEL=gemini-2.5-pro ``` ### Use Anthropic's Claude Models ```bash export ANTHROPIC_MODEL=anthropic/claude-3-5-sonnet-20241022 ``` ## Environment Variables ### ANTHROPIC\_MODEL Specifies the main model to use for primary requests. ```bash export ANTHROPIC_MODEL=gpt-5 ``` ### Complete Configuration Example ```bash export ANTHROPIC_BASE_URL=https://api.llmgateway.io export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here export ANTHROPIC_MODEL=gpt-5 export ANTHROPIC_SMALL_FAST_MODEL=gpt-5-nano ``` ## Making Manual API Requests If you want to test the endpoint directly, you can make manual requests: ```bash curl -X POST "https://api.llmgateway.io/v1/messages" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-5", "messages": [ {"role": "user", "content": "Hello, how are you?"} ], "max_tokens": 100 }' ``` ### Response Format The endpoint returns responses in Anthropic's message format: ```json { "id": "msg_abc123", "type": "message", "role": "assistant", "model": "gpt-5", "content": [ { "type": "text", "text": "Hello! I'm doing well, thank you for asking. How can I help you today?" } ], "stop_reason": "end_turn", "stop_sequence": null, "usage": { "input_tokens": 13, "output_tokens": 20 } } ``` ## What You Get * **Any model in Claude Code** β€” GPT-5 for heavy lifting, GPT-4o Mini for routine tasks * **Cost visibility** β€” See exactly what each coding session costs * **One bill** β€” Stop managing separate accounts for OpenAI, Anthropic, Google * **Response caching** β€” Repeated requests (like linting the same file) hit cache * **Discounts** β€” Check [discounted models](https://llmgateway.io/models?discounted=true) for savings up to 90% View all available models on the [models page](https://llmgateway.io/models). Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # LLM Gateway CLI URL: /guides/cli import { Step, Steps } from "fumadocs-ui/components/steps"; import { Callout } from "fumadocs-ui/components/callout"; import { Tabs, Tab } from "fumadocs-ui/components/tabs"; The **LLM Gateway CLI** (`@llmgateway/cli`) is a command-line utility for scaffolding projects, managing AI applications, and discovering models. ## Installation Run commands directly without installation: ```bash npx @llmgateway/cli init ``` Install globally for faster access: ```bash npm install -g @llmgateway/cli ``` Then run commands directly: ```bash llmgateway init ``` ## Quick Start ### Initialize a Project Create a new project from a template: ```bash npx @llmgateway/cli init ``` Or specify the template and name directly: ```bash npx @llmgateway/cli init --template image-generation --name my-ai-app ``` ### Configure Authentication Login to save your API key locally: ```bash npx @llmgateway/cli auth login ``` This opens a browser window to authenticate with LLM Gateway. Your credentials are stored in `~/.llmgateway/config.json`. Alternatively, set the `LLMGATEWAY_API_KEY` environment variable which takes precedence over the config file. ### Start Development Navigate to your project and start the development server: ```bash cd my-ai-app npx @llmgateway/cli dev ``` Or specify a custom port: ```bash npx @llmgateway/cli dev --port 3000 ``` ## Commands ### `init` Initialize a new project from a template. ```bash npx @llmgateway/cli init [options] ``` **Options:** * `--template ` β€” Template to use (e.g., `image-generation`, `weather-agent`) * `--name ` β€” Project name **Examples:** ```bash # Interactive mode npx @llmgateway/cli init # With options npx @llmgateway/cli init --template image-generation --name my-app ``` ### `list` Display available project templates. ```bash npx @llmgateway/cli list ``` **Options:** * `--json` β€” Output in JSON format ### `models` Browse and filter available AI models. ```bash npx @llmgateway/cli models [options] ``` **Options:** * `--capability ` β€” Filter by capability (e.g., `chat`, `image`, `embedding`) * `--provider ` β€” Filter by provider (e.g., `openai`, `anthropic`, `google`) * `--search ` β€” Search models by name **Examples:** ```bash # List all models npx @llmgateway/cli models # Filter by provider npx @llmgateway/cli models --provider openai # Search models npx @llmgateway/cli models --search gpt ``` ### `add` Add tools or API routes to an existing project. ```bash npx @llmgateway/cli add ``` **Tools available:** * `weather` β€” Weather lookup functionality * `search` β€” Web search capability * `calculator` β€” Mathematical operations **API routes available:** * `generate` β€” Text generation endpoint * `chat` β€” Chat completion endpoint ### `auth` Manage API authentication. ```bash # Login via browser npx @llmgateway/cli auth login # Check authentication status npx @llmgateway/cli auth status # Logout npx @llmgateway/cli auth logout ``` ### `dev` Start the local development server. ```bash npx @llmgateway/cli dev [options] ``` **Options:** * `--port ` β€” Port to run on (default: 3000) ### `upgrade` Update LLM Gateway dependencies in your project. ```bash npx @llmgateway/cli upgrade [options] ``` **Options:** * `--dry-run` β€” Show what would be updated without making changes ### `docs` Open the documentation in your browser. ```bash npx @llmgateway/cli docs ``` ## Available Templates ### Image Generation A full-stack application for AI image generation. * **Stack:** Next.js 16, React 19, TypeScript * **Features:** Multi-provider support (DALL-E, Stable Diffusion), unified API * **Use case:** Image generation apps, creative tools ```bash npx @llmgateway/cli init --template image-generation ``` ### QA Agent An AI-powered QA testing agent that uses browser automation to test your web app. * **Stack:** Next.js 16, React 19, TypeScript, Agent Browser * **Features:** Natural language testing, real-time action timeline, live browser preview * **Use case:** Automated QA testing, regression testing, user flow validation ```bash npx @llmgateway/cli init --template qa-agent ``` ### Weather Agent A CLI agent demonstrating tool calling capabilities. * **Stack:** TypeScript, AI SDK, OpenAI * **Features:** Tool calling, real-time data, natural language * **Use case:** Learning tool usage, building CLI agents ```bash npx @llmgateway/cli init --template weather-agent ``` ## Configuration The CLI stores configuration in `~/.llmgateway/config.json`: ```json { "apiKey": "llmgtwy_...", "defaultTemplate": "image-generation" } ``` ### Environment Variables The `LLMGATEWAY_API_KEY` environment variable takes precedence over the config file: ```bash export LLMGATEWAY_API_KEY="llmgtwy_..." ``` ## More Resources * [Agents](https://llmgateway.io/agents) β€” Pre-built AI agents * [Templates](https://llmgateway.io/templates) β€” Production-ready starter projects * [GitHub Repository](https://github.com/theopenco/llmgateway-templates) β€” Source code and issues Need help or want to request a feature? Open an issue on [GitHub](https://github.com/theopenco/llmgateway-templates/issues). # Cline Integration URL: /guides/cline import { Step, Steps } from "fumadocs-ui/components/steps"; import { Callout } from "fumadocs-ui/components/callout"; [Cline](https://cline.bot) is an autonomous AI coding assistant that lives in your VS Code editor. It can create and edit files, run terminal commands, and help you build complex projects. You can configure Cline to use LLM Gateway for access to multiple AI providers with unified billing and cost tracking. ## Prerequisites * VS Code based IDE installed * An LLM Gateway API key ## Setup Cline supports OpenAI-compatible API endpoints, making it straightforward to integrate with LLM Gateway. ### Install Cline Extension 1. Open VS Code 2. Go to the Extensions view (Cmd/Ctrl + Shift + X) 3. Search for "Cline" 4. Click **Install** on the Cline extension Install Cline Extension ### Open Cline Settings 1. Click on the Cline icon in the VS Code sidebar 2. Click the settings gear icon in the Cline panel Cline Settings ### Configure API Provider 1. In the API Provider dropdown, select **OpenAI Compatible** 2. Enter the following details: * **Base URL**: `https://api.llmgateway.io/v1` * **API Key**: Your LLM Gateway API key * **Model ID**: Choose a model (e.g., `claude-opus-4-5-20251101`, `gpt-5.2`, `gemini-3-pro-preview`, `deepseek-3.2`). See [provider-specific routing](/features/routing#provider-specific-routing) for more options. Configure API Provider ### Test the Integration 1. Open a project in VS Code 2. Click on the Cline icon in the sidebar 3. Type a message like "Create a hello world function in Python" 4. Cline should respond and offer to create the file Test Cline All requests will now be routed through LLM Gateway. View all available models on the [models page](https://llmgateway.io/models). ## Features Once configured, you can use all of Cline's features with LLM Gateway: ### Autonomous Coding * Create new files and projects from scratch * Edit existing code based on natural language instructions * Refactor and improve code quality ### Terminal Commands * Run build commands, tests, and scripts * Install dependencies * Execute any terminal operation ### File Management * Create, read, and modify files * Navigate your codebase * Search for relevant code ## Model Selection Tips ### Using Provider-Specific Models To use a specific provider's version of a model, prefix the model ID with the provider name. See [provider-specific routing](/features/routing#provider-specific-routing) for more options. ### Using Discounted Models LLM Gateway offers discounted access to some models. Find them on the [models page](https://llmgateway.io/models?view=grid\&filters=1\&discounted=true) and copy the model ID. ### Using Free Models Some models are available for free. Browse them on the [models page](https://llmgateway.io/models?view=grid\&filters=1\&free=true). Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. ## Benefits of Using LLM Gateway with Cline * **Multi-Provider Access**: Use models from OpenAI, Anthropic, Google, and more through a single API * **Cost Control**: Track and limit your AI spending with detailed usage analytics * **Unified Billing**: One account for all providers instead of managing multiple API keys * **Caching**: Reduce costs with response caching for repeated requests * **Analytics**: Monitor usage patterns and costs in the dashboard # Codex CLI Integration URL: /guides/codex-cli import { Step, Steps } from "fumadocs-ui/components/steps"; import { Callout } from "fumadocs-ui/components/callout"; Codex CLI is OpenAI's open-source terminal coding agent. By default it connects to OpenAI's API, but with LLM Gateway you can route it through a single gatewayβ€”use GPT-5.3 Codex, Gemini, Claude, or any of 180+ models while keeping full cost visibility. One config file. No code changes. Full cost tracking in your dashboard. ## Setup ### Sign Up for LLM Gateway [Sign up free](https://llmgateway.io/signup) β€” no credit card required. Copy your API key from the dashboard. ### Set Your API Key Set your LLM Gateway API key as the OpenAI key: ```bash export OPENAI_API_KEY=llmgtwy_your_api_key_here ``` ### Create Config File Create or edit `~/.codex/config.toml`: ```bash openai_base_url = "https://api.llmgateway.io/v1" model = "auto" model_reasoning_effort = "high" [tui] show_tooltips = false [model_providers.openai] name = "OpenAI" base_url = "https://api.llmgateway.io/v1" ``` ### Run Codex CLI ```bash codex ``` All requests will now be routed through LLM Gateway. ## Why This Works LLM Gateway's `/v1` endpoint is fully OpenAI-compatible. Codex CLI sends requests to our gateway instead of OpenAI directly, and we route them to the right provider behind the scenes. This means: * **Use any model** β€” GPT-5.3 Codex, Gemini, Claude, or 180+ others * **Keep your workflow** β€” Codex CLI doesn't know the difference * **Track costs** β€” Every request appears in your LLM Gateway dashboard * **Automatic caching** β€” Repeated requests hit cache, saving money ## Configuration Explained ### Base URL The `openai_base_url` and `base_url` fields point Codex CLI to LLM Gateway instead of OpenAI: ```bash openai_base_url = "https://api.llmgateway.io/v1" ``` ### Model Selection Use `auto` to let LLM Gateway pick the best model, or set a specific one from the [models page](https://llmgateway.io/models): ```bash model = "auto" # or pick a specific model model = "gpt-5.3-codex" ``` ### Reasoning Effort Control how much reasoning the model uses. Options are `low`, `medium`, and `high`: ```bash model_reasoning_effort = "high" ``` ## Choosing Models Use `auto` to let LLM Gateway pick the best model automatically, or choose a specific one from the [models page](https://llmgateway.io/models): ```bash # let LLM Gateway pick the best model model = "auto" # or pick a specific model model = "gpt-5.3-codex" ``` ## What You Get * **Any model in Codex CLI** β€” GPT-5.3 Codex for heavy lifting, lighter models for routine tasks * **Cost visibility** β€” See exactly what each coding session costs * **One bill** β€” Stop managing separate accounts for OpenAI, Anthropic, Google * **Response caching** β€” Repeated requests hit cache automatically * **Discounts** β€” Check [discounted models](https://llmgateway.io/models?discounted=true) for savings up to 90% ## Troubleshooting ### Authentication errors Make sure your `OPENAI_API_KEY` environment variable is set to your LLM Gateway API key (starts with `llmgtwy_`). ### Model not found Verify the model ID matches exactly what's listed on the [models page](https://llmgateway.io/models). Model IDs are case-sensitive. ### Connection issues Check that `base_url` is set to `https://api.llmgateway.io/v1` (note the `/v1` at the end). View all available models on the [models page](https://llmgateway.io/models). Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # Cursor Integration URL: /guides/cursor import { Step, Steps } from "fumadocs-ui/components/steps"; import { Callout } from "fumadocs-ui/components/callout"; Cursor is an AI-powered code editor built on VSCode. You can configure Cursor to use LLM Gateway for enhanced AI capabilities, access to multiple models, and better cost control. Cursor with LLM Gateway ## Prerequisites * An LLM Gateway account with an API key * Cursor IDE installed * Basic understanding of Cursor's AI features ## Setup Cursor supports OpenAI-compatible API endpoints, making it easy to integrate with LLM Gateway. ### Get Your API Key 1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard) 2. Navigate to **API Keys** section 3. Create a new API key and copy the key LLM Gateway API Keys ### Configure Cursor Settings 1. Open Cursor and go to **Settings** then Click on "Cursor Settings" 2. Click on "Models" 3. Click on "Add OpenAI API Key" Cursor Settings 3. Scroll down to **OpenAI API Key** section 4. Click on **Add OpenAI API Key** Cursor API Key Input 5. Enter your LLM Gateway API key 6. In the same Models settings, find the **Override OpenAI Base URL** option 7. Enable the override option 8. Enter the LLM Gateway endpoint: `https://api.llmgateway.io/v1` ### Select Models 1. In the **Models** section, you can now select from available models 2. Choose any [LLM Gateway supported model](https://llmgateway.io/models): Cursor Model Selection * For chat: Use models like `gpt-5`, `gpt-4o`, `claude-sonnet-4-5` * For custom models: Add the provider name before the model name (e.g. `custom/my-model`) * For discounted models: copy the ids from from the [models page](https://llmgateway.io/models?view=grid\&filters=1\&discounted=true) * For free models: copy the ids from from the [models page](https://llmgateway.io/models?view=grid\&filters=1\&free=true) * For reasoning models: copy the ids from from the [models page](https://llmgateway.io/models?view=grid\&filters=1\&reasoning=true) ### Test the Integration 1. Open any code file in Cursor 2. Try using the AI chat (Cmd/Ctrl + L) 3. Or test the autocomplete feature while typing Cursor AI Chat Cursor AI Chat 2 All AI requests will now be routed through LLM Gateway. ## Features Once configured, you can use all of Cursor's AI features with LLM Gateway: ### AI Chat (Cmd/Ctrl + L) * Ask questions about your code * Request code explanations * Get debugging help * Generate new code ### Inline Edit (Cmd/Ctrl + K) * Edit code with natural language instructions * Refactor functions * Add features to existing code ### Autocomplete * Get intelligent code suggestions as you type * Context-aware completions based on your codebase ## Advanced Configuration ### Using Different Models for Different Features Cursor allows you to configure different models for different features: 1. **Chat Model**: Use a powerful model like `gpt-5` or `claude-sonnet-4-5` 2. **Autocomplete Model**: Use a faster, cost-effective model like `gpt-4o-mini` 3. **Custom Model**: Use a custom model like `custom/my-model` 4. **Reasoning Model**: Use a reasoning model like `canopywave/kimi-k2-thinking` [with 75% off discount](https://llmgateway.io/changelog/canopywave-kimi-k2-thinking-discount) This gives you the best balance of performance and cost. ### Model Routing With LLM Gateway's [routing features](/features/routing), you can: * **Chooses cost-effective models** by default for optimal price-to-performance ratio * **Automatically scales to more powerful models** based on your request's context size * **Handles large contexts intelligently** by selecting models with appropriate context windows ## Troubleshooting ### Authentication Errors If you see authentication errors: * Verify your API key is correct * Check that the base URL is set to `https://api.llmgateway.io/v1` * Ensure your LLM Gateway account has sufficient credits ### Model Not Found If you see "model not found" errors: * Verify the model ID exists in the [models page](https://llmgateway.io/models) * Check that you're using the correct model name format * Some models may require specific provider configurations in your LLM Gateway dashboard ### Slow Responses If responses are slow: * Check your internet connection * Monitor your usage in the LLM Gateway dashboard * Consider using faster models for autocomplete features Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. ## Benefits of Using LLM Gateway with Cursor * **Multi-Provider Access**: Use models from OpenAI, Anthropic, Google, Open-source models and more * **Cost Control**: Track and limit your AI spending with detailed usage analytics * **Caching**: Reduce costs with response caching * **Analytics**: Monitor usage patterns and costs # Model Context Protocol (MCP) URL: /guides/mcp import { Step, Steps } from "fumadocs-ui/components/steps"; import { Callout } from "fumadocs-ui/components/callout"; import { Tabs, Tab } from "fumadocs-ui/components/tabs"; LLM Gateway provides a Model Context Protocol (MCP) server that enables AI assistants like Claude Code to access multiple LLM providers through a unified interface. This allows you to use any model from OpenAI, Anthropic, Google, and more directly from your AI coding assistant. ## What is MCP? The Model Context Protocol (MCP) is an open standard that allows AI assistants to connect with external tools and data sources. LLM Gateway's MCP server exposes tools for: * **Chat completions** - Send messages to any supported LLM * **Image generation** - Generate images using models like Qwen Image * **Nano Banana image generation** - Generate images with Gemini 3 Pro Image Preview and optionally save to disk * **Model discovery** - List available models with capabilities and pricing ## Available Tools ### `chat` Send a message to any LLM and get a response. **Parameters:** * `model` (string) - The model to use (e.g., `"gpt-4o"`, `"claude-sonnet-4-20250514"`) * `messages` (array) - Array of messages with `role` and `content` * `temperature` (number, optional) - Sampling temperature (0-2) * `max_tokens` (number, optional) - Maximum tokens to generate **Example:** ```json { "model": "gpt-4o", "messages": [{ "role": "user", "content": "Explain quantum computing" }], "temperature": 0.7 } ``` ### `generate-image` Generate images from text prompts using AI image models. **Parameters:** * `prompt` (string) - Text description of the image to generate * `model` (string, optional) - Image model (default: `"qwen-image-plus"`) * `size` (string, optional) - Image size (default: `"1024x1024"`) * `n` (number, optional) - Number of images (1-4, default: 1) **Example:** ```json { "prompt": "A serene mountain landscape at sunset", "model": "qwen-image-max", "size": "1024x1024" } ``` ### `generate-nano-banana` Generate an image using Gemini 3 Pro Image Preview ("Nano Banana"). Returns an inline image preview, and optionally saves the image to disk when the server is configured with an upload directory. **Parameters:** * `prompt` (string) - Text description of the image to generate * `filename` (string, optional) - Filename for the saved image, no path separators allowed (default: `nano-banana-{timestamp}.png`) * `aspect_ratio` (string, optional) - Aspect ratio: `"1:1"`, `"16:9"`, `"4:3"`, or `"5:4"` **Example:** ```json { "prompt": "A pixel-art cat sitting on a rainbow", "filename": "hero-image.png", "aspect_ratio": "16:9" } ``` **Saving images to disk** requires the `UPLOAD_DIR` environment variable to be set on the MCP server. When set, images are saved to that directory. Without it, images are returned inline only β€” no files are written to disk. See [Enabling local image saving](#enabling-local-image-saving) for setup instructions. ### `list-models` List available LLM models with capabilities and pricing. **Parameters:** * `include_deactivated` (boolean, optional) - Include deactivated models * `exclude_deprecated` (boolean, optional) - Exclude deprecated models * `limit` (number, optional) - Maximum models to return (default: 20) * `family` (string, optional) - Filter by family (e.g., `"openai"`, `"anthropic"`) ### `list-image-models` List all available image generation models. **Example output:** ``` # Image Generation Models ## Qwen Image Plus - **Model ID:** `qwen-image-plus` - **Description:** Text-to-image with excellent text rendering - **Price:** $0.03 per request ## Qwen Image Max - **Model ID:** `qwen-image-max` - **Description:** Highest quality text-to-image - **Price:** $0.075 per request ``` ## Setup ### Get Your API Key 1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard) 2. Navigate to **API Keys** section 3. Create a new API key and copy it ### Configure Claude Code Run the following command in your terminal: ```bash claude mcp add --transport http --scope user llmgateway https://api.llmgateway.io/mcp \ --header "Authorization: Bearer your-api-key-here" ``` **Alternative: Manual configuration** You can also add the MCP server manually by editing `~/.claude.json` (user scope) or `.mcp.json` in your project root (project scope): ```json { "mcpServers": { "llmgateway": { "url": "https://api.llmgateway.io/mcp", "headers": { "Authorization": "Bearer your-api-key-here" } } } } ``` Restart Claude Code after manual configuration changes. ### Test the Integration Try using the tools in Claude Code: * "Use the chat tool to ask GPT-4o about TypeScript best practices" * "Generate an image of a futuristic city using the generate-image tool" * "Use generate-nano-banana to create a hero image for my landing page" * "List all available models from Anthropic" ### Get Your API Key 1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard) 2. Navigate to **API Keys** section 3. Create a new API key and copy it 4. Set it as an environment variable: `export LLM_GATEWAY_API_KEY="your-api-key-here"` ### Configure Codex Run the following command in your terminal: ```bash codex mcp add llmgateway --url https://api.llmgateway.io/mcp \ --bearer-token-env-var LLM_GATEWAY_API_KEY ``` **Alternative: Manual configuration** You can also add the MCP server manually by editing `~/.codex/config.toml`: ```toml [mcp_servers.llmgateway] url = "https://api.llmgateway.io/mcp" bearer_token_env_var = "LLM_GATEWAY_API_KEY" ``` ### Test the Integration Run `/mcp` in the Codex TUI to confirm the `llmgateway` server is connected. Try: * "Use the chat tool to ask GPT-4o about TypeScript best practices" * "Generate an image of a futuristic city using the generate-image tool" * "Use generate-nano-banana to create a hero image for my landing page" * "List all available models from Anthropic" ### Get Your API Key 1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard) 2. Navigate to **API Keys** section 3. Create a new API key and copy it ### Configure Cursor Add the following to your Cursor MCP configuration file (`~/.cursor/mcp.json`): ```json { "mcpServers": { "llmgateway": { "url": "https://api.llmgateway.io/mcp", "headers": { "Authorization": "Bearer your-api-key-here" } } } } ``` Or open the Command Palette (`Cmd/Ctrl + Shift + P`), search for **"Cursor Settings"**, then go to **Tools & Integrations** > **Add Custom MCP** and paste the configuration above. Cursor v0.48.0+ is required for Streamable HTTP MCP support. ### Test the Integration Open a chat in **Agent Mode**, click the **Select Tools** icon, and verify the LLM Gateway tools appear. Try: * "Use the chat tool to ask GPT-4o about TypeScript best practices" * "Generate an image of a futuristic city using the generate-image tool" * "Use generate-nano-banana to create a hero image for my landing page" * "List all available models from Anthropic" LLM Gateway's MCP server supports the standard HTTP Streamable transport. Configure your client with: * **Endpoint:** `https://api.llmgateway.io/mcp` * **Authentication:** Bearer token via `Authorization` header or `x-api-key` header * **Protocol Version:** 2024-11-05 **Direct HTTP Example:** ```bash curl -X POST https://api.llmgateway.io/mcp \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/list" }' ``` **Server-Sent Events (SSE):** For real-time updates, connect with `Accept: text/event-stream`: ```bash curl -N https://api.llmgateway.io/mcp \ -H "Accept: text/event-stream" \ -H "Authorization: Bearer your-api-key" ``` ## Use Cases ### Multi-Model Access in Claude Code Use Claude Code to interact with models it doesn't natively support: ``` Use the chat tool with model "gpt-4o" to analyze this code for security issues. ``` ### Image Generation Generate images directly from your AI assistant: ``` Use generate-image to create a logo for my new startup. It should be minimalist, blue and white, representing AI and cloud computing. ``` ### Nano Banana (Gemini Image Generation) Generate images with Gemini 3 Pro for use in your project: ``` Use generate-nano-banana to create a hero image for my landing page with a 16:9 aspect ratio. ``` ### Cost-Effective Model Selection Query available models to find the best option for your task: ``` List models from OpenAI and Anthropic, then use the cheapest one for this simple task. ``` ## Authentication The MCP server supports two authentication methods: 1. **Bearer Token** - `Authorization: Bearer your-api-key` 2. **API Key Header** - `x-api-key: your-api-key` Your API key is the same one you use for the REST API and works across all LLM Gateway services. ## OAuth Support For applications that prefer OAuth authentication, LLM Gateway's MCP server implements OAuth 2.0: * **Authorization Endpoint:** `/oauth/authorize` * **Token Endpoint:** `/oauth/token` * **Registration Endpoint:** `/oauth/register` * **Supported Flows:** Authorization Code, Client Credentials ## Enabling Local Image Saving By default, `generate-nano-banana` returns images inline without writing to disk. To enable saving generated images to the server filesystem, the `UPLOAD_DIR` environment variable must be set on the **gateway host** at startup. This is a server-side setting β€” it cannot be configured from the client. This is only possible for **self-hosted** MCP deployments. Configure `UPLOAD_DIR` using your deployment method: * **Docker:** Pass `-e UPLOAD_DIR=/data/images` or add it to your `docker-compose.yml` environment section. * **systemd:** Add `Environment=UPLOAD_DIR=/data/images` to your service unit file. * **.env file:** Add `UPLOAD_DIR=/data/images` to the `.env` file loaded by your gateway process. The shared hosted endpoint (`api.llmgateway.io`) does not support configuring `UPLOAD_DIR`. On the hosted service, images are always returned inline β€” no files are written to disk. To enable server-side image saving, you must self-host the MCP server and set `UPLOAD_DIR` at startup. ## Troubleshooting ### Connection Errors If you're having trouble connecting: 1. Verify your API key is valid 2. Check the endpoint URL is correct: `https://api.llmgateway.io/mcp` 3. Ensure your firewall allows outbound HTTPS connections ### Tool Not Found If tools aren't appearing: 1. Restart your MCP client 2. Check the configuration syntax 3. Verify the MCP server is responding: `GET https://api.llmgateway.io/mcp` ### Rate Limiting The MCP server respects your account's rate limits. If you're hitting limits: 1. Check your usage in the dashboard 2. Consider upgrading your plan 3. Implement request queuing in your application Need help? Join our [Discord community](https://llmgateway.io/discord) for support. ## Benefits * **Unified Access** - Use 200+ models from 20+ providers through one interface * **Cost Tracking** - Monitor usage and costs in the LLM Gateway dashboard * **Caching** - Automatic response caching reduces costs and latency * **Fallback** - Automatic provider failover ensures reliability * **Image Generation** - Generate images directly from your AI assistant # N8n Integration URL: /guides/n8n import { Step, Steps } from "fumadocs-ui/components/steps"; import { Callout } from "fumadocs-ui/components/callout"; n8n is a powerful workflow automation tool that can be enhanced with AI capabilities through LLM Gateway. This guide shows how to integrate LLM Gateway into your n8n workflows. n8n workflow with LLM Gateway ## Prerequisites * An LLM Gateway account with an API key * n8n instance (self-hosted or cloud) * Basic understanding of n8n workflows ## Setup The easiest way to use LLM Gateway with n8n is through the OpenAI node with custom configuration. ### Add OpenAI Credentials 1. In n8n, go to **Settings** β†’ **Credentials** n8n credentials 2. Click **Add Credential** β†’ **OpenAI** n8n credentials 3. Configure as follows: * **API Key**: Your LLM Gateway API key * **Base URL**: `https://api.llmgateway.io/v1` * **Organization ID**: Leave blank n8n credentials ### Configure OpenAI Node 1. Add an **AI Agent** node to your workflow 2. Add a **Chat Model** edge to the node n8n credentials 3. Configure the node to use the LLMGateway provider n8n credentials Note: You have to toggle off the responses API. LLMGateway does not support it. responses api 4. Select your desired options * **Model**: Use any [LLMGateway model](https://llmgateway.io/models) ID (e.g., `gpt-5`) * **Options**: Optionally, configure LLM parameters n8n credentials ### Test Workflow Finally, try running your workflow with a test prompt. n8n credentials # OpenClaw Integration URL: /guides/openclaw import { Step, Steps } from "fumadocs-ui/components/steps"; import { Callout } from "fumadocs-ui/components/callout"; [OpenClaw](https://docs.openclaw.ai/) is a self-hosted gateway that connects your favorite chat appsβ€”WhatsApp, Telegram, Discord, iMessage, and moreβ€”to AI coding agents. With LLM Gateway as a custom provider, you can route all your OpenClaw traffic through a single API, use any of 180+ models, and keep full visibility into usage and costs. ## Setup ### Sign Up for LLM Gateway [Sign up free](https://llmgateway.io/signup) β€” no credit card required. Copy your API key from the dashboard. ### Set Your API Key ```bash export LLMGATEWAY_API_KEY=llmgtwy_your_api_key_here ``` ### Configure OpenClaw Add LLM Gateway as a custom provider in your `~/.openclaw/openclaw.json`: ```json { "models": { "mode": "merge", "providers": { "llmgateway": { "baseUrl": "https://api.llmgateway.io/v1", "apiKey": "${LLMGATEWAY_API_KEY}", "api": "openai-completions", "models": [ { "id": "gpt-5.4", "name": "GPT-5.4", "contextWindow": 128000, "maxTokens": 32000 }, { "id": "claude-opus-4-6", "name": "Claude Opus 4.6", "contextWindow": 200000, "maxTokens": 8192 }, { "id": "gemini-3-1-pro-preview", "name": "Gemini 3.1 Pro", "contextWindow": 1000000, "maxTokens": 8192 } ] } } }, "agents": { "defaults": { "model": { "primary": "llmgateway/gpt-5.4" } } } } ``` ### Start Chatting Launch OpenClaw and start chatting across your connected channels. All requests will be routed through LLM Gateway. ## Why Use LLM Gateway with OpenClaw * **Model flexibility** β€” Switch between GPT-5.4, Claude Opus, Gemini, or any of 180+ models * **Cost tracking** β€” Monitor exactly how much your chat agents cost to run * **Single bill** β€” No need to manage multiple API provider accounts * **Response caching** β€” Repeated queries hit cache, reducing costs * **Rate limit handling** β€” Automatic fallback between providers ## Switching Models Change the primary model in your config to switch between any model: ```json { "agents": { "defaults": { "model": { "primary": "llmgateway/claude-opus-4-6" } } } } ``` ## Model Fallback Chain OpenClaw supports fallback models. If the primary model is unavailable, it automatically falls back: ```json { "agents": { "defaults": { "model": { "primary": "llmgateway/gpt-5.4", "fallbacks": ["llmgateway/claude-opus-4-6"] } } } } ``` ## Available Models LLM Gateway uses root model IDs with smart routingβ€”automatically selecting the best provider based on uptime, throughput, price, and latency. You can use any model from the [models page](https://llmgateway.io/models). Flagship models include: | Model | Best For | | ------------------------ | ------------------------------------------- | | `gpt-5.4` | Latest OpenAI flagship, highest quality | | `claude-opus-4-6` | Anthropic's most capable model | | `claude-sonnet-4-6` | Fast reasoning with extended thinking | | `gemini-3-1-pro-preview` | Google's latest flagship, 1M context window | | `o3` | Advanced reasoning tasks | | `gpt-5.4-pro` | Premium tier with extended reasoning | | `gemini-2.5-flash` | Fast responses, good for high-volume | | `claude-haiku-4-5` | Cost-effective, quick responses | | `grok-3` | xAI flagship | | `deepseek-v3.1` | Open-source with vision and tools | For more details on routing behavior, see [routing](/features/routing). View all available models on the [models page](https://llmgateway.io/models). ## Tips for Chat Agents ### Optimize Costs 1. **Use smaller models for simple tasks** β€” Claude Haiku or Gemini Flash handle basic Q\&A well 2. **Enable caching** β€” LLM Gateway caches identical requests automatically 3. **Set token limits** β€” Configure max tokens to prevent runaway costs ### Improve Response Quality 1. **Choose the right model** β€” Claude Opus excels at nuanced conversation, GPT-5.4 at general tasks 2. **Use system prompts** β€” Configure your agent's personality and capabilities 3. **Test multiple models** β€” LLM Gateway makes it easy to A/B test different providers Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # OpenCode Integration URL: /guides/opencode import { Step, Steps } from "fumadocs-ui/components/steps"; import { Callout } from "fumadocs-ui/components/callout"; [OpenCode](https://opencode.ai) is an open-source AI coding agent for your terminal, IDE, or desktop. This guide shows you how to connect it to LLM Gatewayβ€”giving you access to 180+ models from 60+ providers, all tracked in one dashboard. ## Prerequisites * OpenCode installed β€” visit the [OpenCode download page](https://opencode.ai/download) for your platform * An LLM Gateway API key ## Setup ### Create Configuration File Create `config.json` in your OpenCode configuration directory: **macOS/Linux:** `~/.config/opencode/config.json` **Windows:** `C:\Users\YourUsername\.config\opencode\config.json` ```json { "provider": { "llmgateway": { "npm": "@ai-sdk/openai-compatible", "name": "LLM Gateway", "options": { "baseURL": "https://api.llmgateway.io/v1" }, "models": { "gpt-5": { "name": "GPT-5" }, "gpt-5-mini": { "name": "GPT-5 Mini" }, "gemini-2.5-pro": { "name": "Gemini 2.5 Pro" }, "claude-3-5-sonnet-20241022": { "name": "Claude 3.5 Sonnet" } } } }, "model": "llmgateway/gpt-5" } ``` ### Launch OpenCode and Connect Provider Start OpenCode from your terminal: ```bash opencode ``` **In VS Code/Cursor:** 1. Install the OpenCode extension from the marketplace 2. Open Command Palette (Ctrl+Shift+P or Cmd+Shift+P) 3. Type "OpenCode" and select "Open opencode" Once OpenCode launches, run the `/connect` command to connect to LLM Gateway. ### Select LLM Gateway Provider In the provider list, scroll down to find "LLM Gateway" under the "Other" section and select it. ### Enter Your API Key OpenCode will prompt you for your API key. Enter your LLM Gateway API key and press Enter. OpenCode will automatically save your credentials securely. [Sign up for LLM Gateway](https://llmgateway.io/signup) and create an API key from your dashboard. ### Start Using OpenCode You're all set! OpenCode is now connected to LLM Gateway. You can start asking questions and building with AI. ## Why Use LLM Gateway with OpenCode * **180+ models** β€” GPT-5, Claude, Gemini, Llama, and more from 60+ providers * **One API key** β€” Stop juggling credentials for every provider * **Cost tracking** β€” See what each coding session costs in your dashboard * **Response caching** β€” Repeated requests hit cache automatically * **Volume discounts** β€” The more you use, the more you save ## Adding More Models You can add any model from the [models page](https://llmgateway.io/models) to your configuration. Simply add more entries to the `models` object in your `config.json`: ```json { "provider": { "llmgateway": { "models": { "gpt-5": { "name": "GPT-5" }, "gpt-5-mini": { "name": "GPT-5 Mini" }, "deepseek/deepseek-chat": { "name": "DeepSeek Chat" }, "meta/llama-3.3-70b": { "name": "Llama 3.3 70B" } } } } } ``` After updating `config.json`, restart OpenCode to see the new models. ## Switching Models To change your default model, update the `model` field in your configuration: ```json { "model": "llmgateway/gpt-5-mini" } ``` Or select a different model directly in the OpenCode interface. View all available models on the [models page](https://llmgateway.io/models). ## Troubleshooting ### OpenCode asks for API key every time Make sure the provider ID in your `config.json` matches exactly: `"llmgateway"` (all lowercase, no spaces). ### 404 Not Found errors Verify your `baseURL` is set to `https://api.llmgateway.io/v1` (note the `/v1` at the end). ### Models not showing up After editing `config.json`, restart OpenCode completely for changes to take effect. ### Connection timeout Check that you have an active internet connection and that your API key is valid from the [dashboard](https://llmgateway.io/dashboard). ## Configuration Tips * **Global configuration**: Use `~/.config/opencode/config.json` to apply settings across all projects * **Project-specific**: Place `opencode.json` in your project root to override global settings for that project * **Model selection**: You can specify different models for different types of tasks using OpenCode's agent configuration Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # Anthropic API Compatibility URL: /features/anthropic-endpoint import { Callout } from "fumadocs-ui/components/callout"; # Anthropic API Compatibility LLMGateway provides a native Anthropic-compatible endpoint at `/v1/messages` that allows you to use any model in our catalog while maintaining the familiar Anthropic API format This is especially useful for applications designed for Claude that you want to extend to use other models. Enjoy a 50% discount on our Anthropic models for a limited time. ## Overview The Anthropic endpoint transforms requests from Anthropic's message format to the OpenAI-compatible format used by LLMGateway, then transforms the responses back to Anthropic's format. This means you can: * Use **any model** available in LLMGateway with Anthropic's API format * Maintain existing code that uses Anthropic's SDK or API format * Access models from OpenAI, Google, Cohere, and other providers through the Anthropic interface * Leverage LLMGateway's routing, caching, and cost optimization features ## Basic Usage ## Configuration for Claude Code This endpoint is perfect for configuring Claude Code to use any model available in LLMGateway: ```bash export ANTHROPIC_BASE_URL=https://api.llmgateway.io export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here # optional: specify a model, otherwise it uses the default Claude model export ANTHROPIC_MODEL=gpt-5 # or any model from our catalog # now run claude! claude ``` ### Choosing Models You can use any model from the [models page](https://llmgateway.io/models). Popular options for Claude Code include: ```bash # Use OpenAI's latest model export ANTHROPIC_MODEL=gpt-5 # Use a cost-effective alternative export ANTHROPIC_MODEL=gpt-5-mini # Use Google's Gemini export ANTHROPIC_MODEL=gemini-2.5-pro # Use Anthropic's actual Claude models export ANTHROPIC_MODEL=claude-3-5-sonnet-20241022 ``` ## Environment Variables When configuring Claude Code or other Anthropic-compatible applications, you can use these environment variables: ### ANTHROPIC\_MODEL Specifies the main model to use for primary requests. * **Default**: `claude-sonnet-4-20250514` * **Example**: `export ANTHROPIC_MODEL=gpt-5` ### ANTHROPIC\_SMALL\_FAST\_MODEL Specifies a smaller, faster model used for background functionality and internal operations. * **Default**: `claude-3-5-haiku-20241022` * **Example**: `export ANTHROPIC_SMALL_FAST_MODEL=gpt-5-nano` ```bash # Example configuration export ANTHROPIC_BASE_URL=https://api.llmgateway.io export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here export ANTHROPIC_MODEL=gpt-5 export ANTHROPIC_SMALL_FAST_MODEL=gpt-5-nano ``` ## Advanced Features ### Making a manual request ```bash curl -X POST "https://api.llmgateway.io/v1/messages" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-5", "messages": [ {"role": "user", "content": "Hello, how are you?"} ], "max_tokens": 100 }' ``` ### Response Format The endpoint returns responses in Anthropic's message format: ```json { "id": "msg_abc123", "type": "message", "role": "assistant", "model": "gpt-5", "content": [ { "type": "text", "text": "Hello! I'm doing well, thank you for asking. How can I help you today?" } ], "stop_reason": "end_turn", "stop_sequence": null, "usage": { "input_tokens": 13, "output_tokens": 20 } } ``` # API Keys & IAM Rules URL: /features/api-keys import { Tabs, Tab } from "fumadocs-ui/components/tabs"; import { Callout } from "fumadocs-ui/components/callout"; # API Keys & IAM Rules API keys are the primary method for authenticating with the LLM Gateway. This guide covers creating API keys, managing them, and configuring IAM rules for fine-grained access control. ## Overview LLM Gateway provides comprehensive API key management with the following features: * **Basic API Key Management**: Create, list, update, and delete API keys * **Usage Limits**: Set spending limits on individual API keys * **IAM Rules**: Fine-grained access control for models, providers, and pricing * **Usage Tracking**: Monitor API key usage and costs * **Status Management**: Enable/disable keys without deletion ## Creating API Keys ### Via Dashboard At this time, API keys can only be created via the dashboard. 1. Navigate to your project in the LLM Gateway dashboard 2. Go to the **API Keys** section 3. Click **Create API Key** 4. Provide a description for your key 5. Optionally set a usage limit 6. Click **Create** API keys are shown in full only once during creation. Make sure to copy and store them securely. ## Using API Keys Once you have an API key, use it in the `Authorization` header of your requests: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer llmgtwy_your_api_key_here" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }' ``` ## API Key Management ## Disabling/Enabling API Keys You can disable an API key to stop it from being used, but the key is not deleted and can be re-enabled later. ## Usage Limits Usage is tracked per API key which is shown on the API Keys page. Usage includes both costs from LLM Gateway credits and usage from your own provider keys when applicable, giving you complete visibility into total spending per key. You can set a maximum usage limit for each API key. When the limit is reached, requests using that key will return an error. ## IAM Rules IAM (Identity Access Management) rules provide fine-grained access control over what models, providers, and pricing tiers an API key can access. ### Rule Types #### Model Access Rules Control access to specific models: * **Allow Models**: Only allow access to specific models * **Deny Models**: Block access to specific models #### Provider Access Rules Control access to specific providers: * **Allow Providers**: Only allow access to specific providers * **Deny Providers**: Block access to specific providers #### Pricing Rules Control access based on model pricing: * **Allow Pricing**: Set constraints on what pricing tiers are allowed * **Deny Pricing**: Block specific pricing tiers * **Free vs Paid**: Allow or deny access to free vs paid models ## Error Handling When API keys encounter IAM rule violations, the API returns specific error messages: ```json { "error": true, "status": 403, "message": "Access denied: Model gpt-4 is not in the allowed models list" } ``` Common error scenarios: * Model not allowed by IAM rules * Provider blocked by IAM rules * Pricing limits exceeded * API key disabled or deleted * Usage limit reached ## Migration from Legacy Keys If you have existing API keys without IAM rules: 1. **Backward Compatibility**: Existing keys continue to work without restrictions 2. **Gradual Migration**: Add IAM rules incrementally 3. **Testing**: Test IAM rules in development before applying to production 4. **Monitoring**: Monitor for access denied errors after implementing rules API keys without IAM rules have unrestricted access to all models and providers. # Audit Logs URL: /features/audit-logs import { Callout } from "fumadocs-ui/components/callout"; # Audit Logs Audit logs provide complete visibility into all actions within your organization. Track who did what, when, and to which resource. Audit logs are available on the [**Enterprise plan**](https://llmgateway.io/enterprise) for organization owners and admins. ## What's Tracked Every significant action is logged with detailed metadata: | Field | Description | | ----------------- | -------------------------------------------------------- | | **Timestamp** | When the action occurred | | **User** | Who performed the action (name and email) | | **Action** | What was done (e.g., `api_key.create`, `project.update`) | | **Resource Type** | Category of the affected resource | | **Resource ID** | Unique identifier of the affected resource | | **Details** | Additional context like resource names or changed fields | ## Tracked Actions ### Organization Management * `organization.update` β€” Organization settings changed * `organization.delete` β€” Organization deleted ### Project Management * `project.create` β€” New project created * `project.update` β€” Project settings changed * `project.delete` β€” Project deleted ### Team Management * `team_member.add` β€” New member invited * `team_member.update` β€” Member role changed * `team_member.remove` β€” Member removed ### API Key Management * `api_key.create` β€” New API key created * `api_key.update_status` β€” API key enabled/disabled * `api_key.update_limit` β€” Usage limit changed * `api_key.delete` β€” API key deleted * `api_key.iam_rule.create` β€” IAM rule added * `api_key.iam_rule.update` β€” IAM rule modified * `api_key.iam_rule.delete` β€” IAM rule removed ### Provider Key Management * `provider_key.create` β€” Provider key added * `provider_key.update` β€” Provider key status changed * `provider_key.delete` β€” Provider key removed ### Billing Events * `subscription.create` β€” Subscription started * `subscription.cancel` β€” Subscription cancelled * `subscription.resume` β€” Subscription resumed * `payment.credit_topup` β€” Credits purchased ## Filtering and Search Filter logs by: * **Action** β€” Specific action type * **Resource Type** β€” Category of resource * **User** β€” Who performed the action * **Date Range** β€” Time period ## Data Retention Audit logs are retained for **90 days** on the Enterprise plan. ## Access Control Only organization **owners** and **admins** can view audit logs. This ensures sensitive activity data is only visible to authorized personnel. ## Get Started Audit logs are an Enterprise feature. [Contact us](https://llmgateway.io/enterprise) to enable Enterprise for your organization. # Caching URL: /features/caching import { Callout } from "fumadocs-ui/components/callout"; # Caching LLM Gateway provides intelligent response caching that can significantly reduce your API costs and response latency. When caching is enabled, identical requests are served from cache instead of making redundant calls to LLM providers. ## How It Works When you make an API request: 1. LLM Gateway generates a cache key based on the request parameters 2. If a matching cached response exists, it's returned immediately 3. If no cache exists, the request is forwarded to the provider 4. The response is cached for future identical requests This means repeated identical requests are served instantly from cache without incurring additional provider costs. ## Cost Savings Caching can dramatically reduce costs for applications with repetitive requests: | Scenario | Without Caching | With Caching | Savings | | --------------------------- | --------------- | ------------ | ------- | | 1,000 identical requests | $10.00 | $0.01 | 99.9% | | 50% duplicate rate | $10.00 | $5.00 | 50% | | Retry after transient error | $0.02 | $0.01 | 50% | Cached responses are free from provider costs. You only pay for the initial request that populates the cache. ## Requirements Caching requires [Data Retention](/features/data-retention) to be enabled with "Retain All Data" level. This allows LLM Gateway to store and retrieve response payloads. To use caching: 1. Enable **Data Retention** in your organization settings with "Retain All Data" level 2. Enable **Caching** in your project settings under Preferences 3. Configure the cache duration (TTL) as needed 4. Make requests as normalβ€”caching is automatic ## Cache Key Generation The cache key is generated from these request parameters: * Model identifier * Messages array (roles and content) * Temperature * Max tokens * Top P * Tools/functions * Tool choice * Response format * System prompt * Other model-specific parameters Requests with different parameter values, even slight variations, will not share cache entries. ## Cache Behavior ### Cache Hits When a cache hit occurs: * Response is returned immediately (sub-millisecond latency) * No provider API call is made * No inference costs are incurred ### Cache Misses When a cache miss occurs: * Request is forwarded to the LLM provider * Response is stored in cache * Normal inference costs apply * Future identical requests will hit the cache ## Streaming and Caching Caching works with both streaming and non-streaming requests: * **Non-streaming**: Full response is cached and returned * **Streaming**: The complete response is reconstructed from cache and streamed back ## Cache TTL (Time-to-Live) Cache duration is configurable per project in your project settings. You can set the cache TTL from 10 seconds up to 1 year (31,536,000 seconds). The default cache duration is 60 seconds. Adjust this based on your use caseβ€”longer durations work well for static content, while shorter durations are better for frequently changing data. ## Identifying Cached Responses Cached responses show zero or minimal token usage since no inference occurred: ```json { "usage": { "prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0, "cost_usd_total": 0 } } ``` ## Use Cases ### Development and Testing During development, you often send the same prompts repeatedly: ```typescript // This prompt will only incur costs once const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Explain quantum computing" }], }); ``` ### Chatbots with Common Questions FAQ-style interactions often have repeated questions: ```typescript // Common questions are served from cache const faqs = [ "What are your business hours?", "How do I reset my password?", "What is your return policy?", ]; ``` ### Batch Processing Processing large datasets with potentially duplicate items: ```typescript // Duplicate items in batch are served from cache for (const item of items) { const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: `Classify: ${item}` }], }); } ``` ## Best Practices ### Maximize Cache Hits * Use consistent prompt formatting * Normalize input data before sending * Use deterministic parameters (temperature: 0) * Avoid including timestamps or random values in prompts ### Appropriate Use Cases Caching is most effective for: * Static knowledge queries * Classification tasks * FAQ responses * Development/testing * Retry scenarios ### When to Avoid Caching Caching may not be suitable for: * Real-time data requirements * Highly personalized responses * Time-sensitive information * Creative tasks requiring variety ## Storage Costs Since caching requires data retention, storage costs apply: * **Rate**: $0.01 per 1 million tokens * **Applies to**: All tokens in cached requests and responses See [Data Retention](/features/data-retention) for complete pricing details. The cost savings from caching typically far outweigh the storage costs, especially for applications with high request duplication. # Cost Breakdown URL: /features/cost-breakdown import { Callout } from "fumadocs-ui/components/callout"; # Cost Breakdown LLM Gateway provides real-time cost information for each API request directly in the response's `usage` object. This allows you to track costs programmatically without needing to query the dashboard. Cost breakdown is available for all users on both hosted and self-hosted deployments. ## Response Format When cost breakdown is enabled, your API responses will include additional cost fields in the `usage` object: ```json { "id": "chatcmpl-123", "object": "chat.completion", "created": 1234567890, "model": "openai/gpt-4o", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 10, "completion_tokens": 15, "total_tokens": 25, "cost_usd_total": 0.000125, "cost_usd_input": 0.000025, "cost_usd_output": 0.0001, "cost_usd_cached_input": 0, "cost_usd_request": 0, "cost_usd_data_storage": 0.00000025 } } ``` ## Cost Fields | Field | Description | | ----------------------- | ---------------------------------------------------------------------------------- | | `cost_usd_total` | Total inference cost for the request in USD (excludes storage) | | `cost_usd_input` | Cost for input/prompt tokens in USD | | `cost_usd_output` | Cost for output/completion tokens in USD | | `cost_usd_cached_input` | Cost for cached input tokens in USD (discounted rate) | | `cost_usd_request` | Per-request cost in USD (for models with request-based pricing) | | `cost_usd_data_storage` | LLM Gateway storage cost in USD ($0.01 per 1M tokens, only when retention enabled) | **Note:** `cost_usd_total` includes only provider/inference costs. Data storage costs (`cost_usd_data_storage`) are billed separately by LLM Gateway when data retention is enabled in organization policies. ## Streaming Responses Cost information is also available in streaming responses. The cost fields are included in the final usage chunk sent before the `[DONE]` message: ``` data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[...],"usage":{"prompt_tokens":10,"completion_tokens":15,"total_tokens":25,"cost_usd_total":0.000125,"cost_usd_input":0.000025,"cost_usd_output":0.0001}} data: [DONE] ``` ## Example: Tracking Costs in Code Here's an example of how to track costs programmatically using the cost breakdown feature: ```typescript import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.LLM_GATEWAY_API_KEY, baseURL: "https://api.llmgateway.io/v1", }); async function trackCosts() { const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], }); const usage = response.usage as any; if (usage.cost_usd_total !== undefined) { console.log(`Request cost: $${usage.cost_usd_total.toFixed(6)}`); console.log(` Input: $${usage.cost_usd_input.toFixed(6)}`); console.log(` Output: $${usage.cost_usd_output.toFixed(6)}`); if (usage.cost_usd_cached_input > 0) { console.log(` Cached: $${usage.cost_usd_cached_input.toFixed(6)}`); } } return response; } ``` ## Use Cases ### Budget Monitoring Track costs in real-time and implement budget limits in your application: ```typescript let totalSpent = 0; const BUDGET_LIMIT = 10.0; // $10 budget async function makeRequest(messages: Message[]) { const response = await client.chat.completions.create({ model: "gpt-4o", messages, }); const cost = (response.usage as any).cost_usd_total || 0; totalSpent += cost; if (totalSpent > BUDGET_LIMIT) { throw new Error(`Budget exceeded: $${totalSpent.toFixed(2)}`); } return response; } ``` ### Per-User Cost Allocation Track costs per user for billing or analytics: ```typescript const userCosts: Map = new Map(); async function makeRequestForUser(userId: string, messages: Message[]) { const response = await client.chat.completions.create({ model: "gpt-4o", messages, }); const cost = (response.usage as any).cost_usd_total || 0; const currentCost = userCosts.get(userId) || 0; userCosts.set(userId, currentCost + cost); return response; } ``` ### Cost Analytics Aggregate costs by model, time period, or any other dimension: ```typescript interface CostEntry { timestamp: Date; model: string; inputCost: number; outputCost: number; totalCost: number; } const costLog: CostEntry[] = []; async function loggedRequest(model: string, messages: Message[]) { const response = await client.chat.completions.create({ model, messages, }); const usage = response.usage as any; costLog.push({ timestamp: new Date(), model: response.model, inputCost: usage.cost_usd_input || 0, outputCost: usage.cost_usd_output || 0, totalCost: usage.cost_usd_total || 0, }); return response; } ``` ## Data Storage Costs When data retention is enabled in organization policies, LLM Gateway stores full request and response payloads for the configured retention period. This storage incurs a small additional cost: * **Rate**: $0.01 per 1 million tokens * **Applies to**: Input, cached, output, and reasoning tokens * **When charged**: Only when retention level is set to "Retain All Data" * **Billing mode**: In API keys mode, only storage costs are deducted from credits (inference costs are billed to your provider keys) Storage costs are displayed separately from inference costs in the dashboard and usage breakdown to maintain transparency between provider costs and LLM Gateway platform costs. Enable [auto top-up](/dashboard) in billing settings to prevent request failures when storage costs deplete your credits. ## Self-Hosted Deployments If you're running a self-hosted LLM Gateway deployment, cost breakdown is always included in API responses regardless of plan. This allows you to track internal costs and allocate them across teams or projects. # Custom Providers URL: /features/custom-providers import { Callout } from "fumadocs-ui/components/callout"; # Custom Providers LLMGateway supports integrating custom OpenAI-compatible providers, allowing you to use any API that follows the OpenAI chat completions format. This feature is perfect for: * Private or self-hosted LLM deployments * Specialized AI providers not natively supported * Internal AI services within your organization * Testing against different model endpoints Custom providers must be OpenAI-compatible, supporting the `/v1/chat/completions` endpoint format. ## Quick Setup ### 1. Add a Custom Provider Key Navigate to your organization's provider settings and add a custom provider via the UI. Provide a lowercase name, OpenAI-compatible base URL, and API token for the custom provider. ### 2. Make Requests Once configured, make requests using the format `{customName}/{modelName}`: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "mycompany/custom-gpt-4", "messages": [ { "role": "user", "content": "Hello from my custom provider!" } ] }' ``` ## Configuration Requirements ### Custom Provider Name * **Format**: Lowercase letters only (`a-z`) * **Examples**: `mycompany`, `internal`, `testing` * **Invalid**: `MyCompany`, `my-company`, `my_company`, `123test` The custom provider name must match the regex pattern `/^[a-z]+$/` exactly. ### Base URL * Must be a valid HTTPS URL * Should point to your provider's base endpoint * LLMGateway will append `/v1/chat/completions` automatically * **Example**: `https://api.example.com` β†’ `https://api.example.com/v1/chat/completions` ### API Token * Provider-specific authentication token * Used in the `Authorization: Bearer {token}` header Unlike built-in providers, custom provider models are not validated, giving you complete flexibility. ## Supported Features Custom providers inherit full LLMGateway functionality. # Data Retention URL: /features/data-retention import { Callout } from "fumadocs-ui/components/callout"; # Data Retention LLM Gateway offers configurable data retention policies that allow you to store full request and response payloads. This enables powerful debugging capabilities, detailed analytics, and compliance with data governance requirements. ## Retention Levels LLM Gateway supports two retention levels that can be configured per organization: | Level | Description | Storage Cost | | ------------------- | ---------------------------------------------------------------------------------------------- | --------------- | | **Metadata Only** | Stores request metadata (timestamps, model, tokens, costs) without full payloads. Default. | Free | | **Retain All Data** | Stores complete request and response payloads including messages, tool calls, and attachments. | $0.01/1M tokens | Metadata-only retention is enabled by default and provides usage analytics without additional storage costs. ## Storage Pricing When full data retention is enabled, storage is billed at **$0.01 per 1 million tokens**. This rate applies to: * Input tokens (prompt) * Cached input tokens * Output tokens (completion) * Reasoning tokens Storage costs are calculated per request and displayed in the `cost_usd_data_storage` field of the response. See [Cost Breakdown](/features/cost-breakdown) for details on tracking costs programmatically. ### Example Cost Calculation For a request with: * 1,000 input tokens * 500 output tokens * 1,500 total tokens Storage cost = 1,500 / 1,000,000 Γ— $0.01 = **$0.000015** ## Configuring Retention Data retention is configured at the organization level in your dashboard settings: 1. Navigate to **Organization Settings** β†’ **Policies** 2. Select your preferred **Data Retention Level** 3. Save changes Changing retention settings applies to new requests only. Existing stored data follows the retention period active when it was created. ## Retention Periods Data is retained for 30 days for all users. Enterprise plans can have custom retention periods. After the retention period expires, data is automatically deleted. ## Accessing Stored Data When data retention is enabled, you can access your stored requests through the dashboard: * View request history with full payload inspection * Filter by model and date range * Inspect complete request and response payloads ## Use Cases ### Debugging Full data retention enables you to: * Inspect exact prompts sent to models * Review complete responses including tool calls * Trace conversation histories * Identify issues in production ### Analytics With stored payloads, you can: * Analyze prompt patterns and effectiveness * Track response quality over time * Build custom dashboards and reports * Measure model performance across use cases ### Compliance Data retention helps meet compliance requirements by: * Maintaining audit trails of AI interactions * Enabling data governance policies * Supporting incident investigation * Providing records for regulatory requirements ## Billing Considerations ### Credit Usage In **API keys mode** (using your own provider keys): * Only storage costs are deducted from LLM Gateway credits * Inference costs are billed directly to your provider In **credits mode**: * Both inference and storage costs are deducted from credits ### Monitoring Storage Costs Storage costs appear in: * The `cost_usd_data_storage` field in API responses * Usage dashboard under "Storage" category * Billing invoices as a separate line item Enable [auto top-up](/dashboard) in billing settings to ensure uninterrupted service when storage costs accumulate. ## Self-Hosted Deployments Self-hosted deployments have full control over data retention: * Configure retention periods in environment variables * Data is stored in your own PostgreSQL database * No additional storage costs (you manage your own infrastructure) ## Privacy and Security * All stored data is encrypted at rest * Access is restricted to organization members with appropriate permissions * Data is automatically deleted after the retention period * You can request immediate deletion of specific records through support # Guardrails URL: /features/guardrails import { Callout } from "fumadocs-ui/components/callout"; # Guardrails Guardrails protect your organization by automatically detecting and blocking harmful content in LLM requests before they reach the model. Guardrails are available on the [**Enterprise plan**](https://llmgateway.io/enterprise). ## Overview Guardrails run on every API request, scanning message content for: * Security threats (prompt injection, jailbreak attempts) * Sensitive data (PII, secrets, credentials) * Policy violations (blocked terms, restricted topics) When a violation is detected, you control what happens: block the request, redact the content, or log a warning. ## System Rules Built-in rules protect against common threats: ### Prompt Injection Detection Detects attempts to override or manipulate system instructions. Common patterns include: * "Ignore all previous instructions" * "You are now a different AI" * Hidden instructions in encoded text ### Jailbreak Detection Identifies attempts to bypass safety measures: * DAN (Do Anything Now) prompts * Roleplay-based bypasses * Instruction override attempts ### PII Detection Identifies personal information: * Email addresses * Phone numbers * Social Security Numbers * Credit card numbers * IP addresses When the action is set to **redact**, PII is replaced with placeholders like `[EMAIL_REDACTED]`. ### Secrets Detection Detects credentials and API keys: * AWS access keys and secrets * Generic API keys * Passwords in common formats * Private keys ### File Type Restrictions Control which file types can be uploaded: * Configure allowed MIME types * Set maximum file size limits * Block potentially dangerous file types ### Document Leakage Prevention Detects attempts to extract confidential documents or internal data. ## Configurable Actions For each rule, choose how to respond: | Action | Behavior | | ---------- | --------------------------------------------------- | | **Block** | Reject the request with a content policy error | | **Redact** | Remove or mask the sensitive content, then continue | | **Warn** | Log the violation but allow the request to proceed | ## Custom Rules Create organization-specific rules for your use case: ### Blocked Terms Prevent specific words or phrases from being used: * Match type: exact, contains, or regex * Case-sensitive matching option * Multiple terms per rule ### Custom Regex Match patterns unique to your organization: * Internal project codenames * Customer identifiers * Domain-specific sensitive data ### Topic Restrictions Block content related to specific topics: * Define restricted topics * Keyword-based detection ## Security Events Dashboard Monitor all guardrail violations with a dedicated dashboard: * **Total violations** β€” Overall count and trends * **By action** β€” Breakdown of blocked, redacted, and warned * **By category** β€” Which rules are being triggered * **Detailed logs** β€” Individual violations with timestamps and matched patterns ## How It Works ``` Request β†’ Guardrails Check β†’ Action Based on Rules β†’ Forward to Model (if allowed) ↓ Log Violation ``` 1. **Request received** β€” API request comes in with messages 2. **Content scanned** β€” All text content is checked against enabled rules 3. **Violations detected** β€” Matches are identified and logged 4. **Action taken** β€” Based on rule configuration (block/redact/warn) 5. **Request proceeds** β€” If not blocked, the (potentially redacted) request continues ## Best Practices 1. **Start with warnings** β€” Enable rules in warn mode first to understand your traffic patterns 2. **Review violations** β€” Check the Security Events dashboard regularly 3. **Tune custom rules** β€” Adjust blocked terms and regex patterns based on false positives 4. **Layer defenses** β€” Use multiple rule types together for comprehensive protection ## Get Started Guardrails are an Enterprise feature. [Contact us](https://llmgateway.io/enterprise) to enable Enterprise for your organization. # Image Generation URL: /features/image-generation import { Callout } from "fumadocs-ui/components/callout"; # Image Generation LLMGateway supports image generation through two APIs: 1. **`/v1/images/generations`** β€” OpenAI-compatible images endpoint (recommended for simple image generation) 2. **`/v1/images/edits`** β€” OpenAI-compatible image editing endpoint 3. **`/v1/chat/completions`** β€” Chat completions with image generation models (for conversational image generation and editing) ## Available Models You can find all available image generation models on our [models page](https://llmgateway.io/models?filters=1\&imageGeneration=true). ## OpenAI Images API The `/v1/images/generations` endpoint provides a drop-in replacement for OpenAI's image generation API. It works with any OpenAI-compatible client library. ### Parameters | Parameter | Type | Default | Description | | ----------------- | ------- | ------------ | ---------------------------------------------------------------------------------------------------------------- | | `prompt` | string | required | A text description of the desired image(s) | | `model` | string | `"auto"` | The model to use. `auto` resolves to `gemini-3-pro-image-preview` | | `n` | integer | `1` | Number of images to generate (1-10) | | `size` | string | β€” | Image dimensions. Supported sizes depend on the model/provider β€” see [Image Configuration](#image-configuration) | | `quality` | string | β€” | Image quality. Supported values depend on the model/provider β€” see [Image Configuration](#image-configuration) | | `response_format` | string | `"b64_json"` | Only `b64_json` is supported | | `style` | string | β€” | Image style: `vivid` or `natural` | ### curl ```bash curl -X POST "https://api.llmgateway.io/v1/images/generations" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-3-pro-image-preview", "prompt": "A cute cat wearing a tiny top hat", "n": 1, "size": "1024x1024" }' ``` ### OpenAI SDK Works with the standard OpenAI client library β€” just point the base URL to LLMGateway. ```ts import OpenAI from "openai"; import { writeFileSync } from "fs"; const client = new OpenAI({ baseURL: "https://api.llmgateway.io/v1", apiKey: process.env.LLM_GATEWAY_API_KEY, }); const response = await client.images.generate({ model: "gemini-3-pro-image-preview", prompt: "A futuristic city skyline at sunset with flying cars", n: 1, size: "1024x1024", }); response.data.forEach((image, i) => { if (image.b64_json) { const buf = Buffer.from(image.b64_json, "base64"); writeFileSync(`image-${i}.png`, buf); } }); ``` ### Vercel AI SDK Use the `@llmgateway/ai-sdk-provider` with `generateImage`. ```ts import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; import { generateImage } from "ai"; import { writeFileSync } from "fs"; const llmgateway = createLLMGateway({ apiKey: process.env.LLM_GATEWAY_API_KEY, }); const result = await generateImage({ model: llmgateway.image("gemini-3-pro-image-preview"), prompt: "A cozy cabin in a snowy mountain landscape at night with aurora borealis", size: "1024x1024", n: 1, }); result.images.forEach((image, i) => { const buf = Buffer.from(image.base64, "base64"); writeFileSync(`image-${i}.png`, buf); }); ``` ## OpenAI Images Edit API The `/v1/images/edits` endpoint is OpenAI-compatible and supports a focused subset of `images.edit` parameters. ### Parameters | Parameter | Type | Required | Description | | -------------------- | ------------------------ | -------- | ------------------------------------------------------------------ | | `images` | array of `{ image_url }` | yes | Input images. `image_url` supports HTTPS URLs and base64 data URLs | | `prompt` | string | yes | A text description of the desired image edit | | `model` | string | no | Image editing model | | `background` | enum | no | `transparent`, `opaque`, or `auto` | | `input_fidelity` | enum | no | `high` or `low` | | `n` | integer | no | Number of edited images to generate | | `output_format` | enum | no | `png`, `jpeg`, or `webp` | | `output_compression` | integer | no | Compression level for `jpeg`/`webp` | | `quality` | enum | no | `low`, `medium`, `high`, or `auto` | | `size` | enum | no | `auto`, `1024x1024`, `1536x1024`, `1024x1536` | `mask` is not supported yet on `/v1/images/edits`. ### curl (HTTPS image URL) ```bash curl -X POST "https://api.llmgateway.io/v1/images/edits" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "images": [ { "image_url": "https://example.com/source-image.png" } ], "prompt": "Add a watercolor effect to this image", "model": "gemini-3-pro-image-preview", "quality": "high", "size": "1024x1024" }' ``` ### curl (base64 data URL) ```bash curl -X POST "https://api.llmgateway.io/v1/images/edits" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "images": [ { "image_url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..." } ], "prompt": "Turn this into a pixel-art style image" }' ``` ## Chat Completions API Image generation also works through the `/v1/chat/completions` endpoint, which is useful for conversational image generation, image editing with vision, and multi-turn interactions. ### Making Requests Simply use an image generation model and provide a text prompt describing the image you want to create. ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-3-pro-image-preview", "messages": [ { "role": "user", "content": "Generate an image of a cute golden retriever puppy playing in a sunny meadow" } ] }' ``` ### Response Format Image generation models return responses in the standard chat completions format, with generated images included in the `images` array within the assistant message: ```json { "id": "chatcmpl-1756234109285", "object": "chat.completion", "created": 1756234109, "model": "gemini-3-pro-image-preview", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Here's an image of a cute dog for you: ", "images": [ { "type": "image_url", "image_url": { "url": "data:image/png;base64," } } ] }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 8, "completion_tokens": 1303, "total_tokens": 1311 } } ``` ### Vision support You can edit or modify images by combining image generation with [vision models](/features/vision) by including the image in the `messages` array. ### Response Structure #### Images Array The `images` array contains one or more generated images with the following structure: * `type`: Always `"image_url"` for generated images * `image_url.url`: A data URL containing the base64-encoded image data (format: `data:image/png;base64,`) #### Content Field The `content` field may contain descriptive text about the generated image, depending on the model's behavior. ### AI SDK (Chat Completions) You can use the AI SDK to generate images with your existing generateText or streamText calls using the LLMGateway provider. #### Example ```ts title="/api/chat/route.ts" import { streamText, type UIMessage, convertToModelMessages } from "ai"; import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; interface ChatRequestBody { messages: UIMessage[]; } export async function POST(req: Request) { const body = await req.json(); const { messages }: ChatRequestBody = body; const llmgateway = createLLMGateway({ apiKey: "llmgateway_api_key", baseUrl: "https://api.llmgateway.io/v1", }); try { const result = streamText({ model: llmgateway.chat("gemini-3-pro-image-preview"), messages: convertToModelMessages(messages), }); return result.toUIMessageStreamResponse(); } catch { return new Response( JSON.stringify({ error: "LLM Gateway Chat request failed" }), { status: 500, }, ); } } ``` Then you can render the image in your frontend using the `Image` component from the [ai-elements](https://ai-sdk.dev/elements/components/image). Here is a full example of how to use the AI SDK to generate images in your frontend: ```tsx title="/app/page.tsx" "use client"; import { useState, useRef } from "react"; import { useChat } from "@ai-sdk/react"; import { parseImagePartToDataUrl } from "@/lib/image-utils"; import { PromptInput, PromptInputBody, PromptInputButton, PromptInputSubmit, PromptInputTextarea, PromptInputToolbar, } from "@/components/ai-elements/prompt-input"; import { Conversation, ConversationContent, } from "@/components/ai-elements/conversation"; import { Image } from "@/components/ai-elements/image"; import { Loader } from "@/components/ai-elements/loader"; import { Message, MessageContent } from "@/components/ai-elements/message"; import { Response } from "@/components/ai-elements/response"; export const ChatUI = () => { const textareaRef = useRef(null); const [text, setText] = useState(""); const { messages, status, stop, regenerate, sendMessage } = useChat(); return ( <>
{messages.length === 0 ? (

How can I help you?

) : ( messages.map((m, messageIndex) => { const isLastMessage = messageIndex === messages.length - 1; if (m.role === "assistant") { const textContent = m.parts .filter((p) => p.type === "text") .map((p) => p.text) .join(""); // Combine all image parts (both image_url and file types) const imageParts = m.parts.filter( (p) => p.type === "file" && p.mediaType?.startsWith("image/"), ); return (
{textContent ? {textContent} : null} {imageParts.length > 0 ? (
{imageParts.map((part, idx: number) => { const { base64Only, mediaType } = parseImagePartToDataUrl(part); if (!base64Only) { return null; } return ( {part.name ); })}
) : null} {isLastMessage && (status === "submitted" || status === "streaming") && ( )}
); } else { return ( {m.parts.map((p, i) => { if (p.type === "text") { return
{p.text}
; } return null; })}
{isLastMessage && (status === "submitted" || status === "streaming") && ( )}
); } }) )}
{ if (status === "streaming") { return; } try { const textContent = message.text ?? ""; if (!textContent.trim()) { return; } setText(""); // Clear input immediately const parts = [{ type: "text", text: textContent }]; // Call sendMessage which will handle adding the user message and API request sendMessage({ role: "user", parts, }); } catch (error) { // Throw error here } }} > setText(e.currentTarget.value)} placeholder="Message" />
{status === "streaming" ? ( stop()} variant="ghost"> Stop ) : null}
); }; ``` ```ts title="/lib/image-utils.ts" /** * Parses a file object containing image data and returns a properly formatted data URL * and normalized media type. * * Handles: * - Normalizing mediaType from various property names (mediaType, mime_type) * - Detecting existing data: URLs * - Detecting base64-looking content * - Stripping whitespace from base64 content * - Building proper data:...;base64,... URLs */ export function parseImageFile(file: { url?: string; mediaType?: string; mime_type?: string; }): { dataUrl: string; mediaType: string } { const mediaType = file.mediaType || file.mime_type || "image/png"; let url = String(file.url || ""); const isDataUrl = url.startsWith("data:"); const looksLikeBase64 = !isDataUrl && /^[A-Za-z0-9+/=\s]+$/.test(url.slice(0, 200)); if (looksLikeBase64) { url = url.replace(/\s+/g, ""); } const dataUrl = isDataUrl ? url : looksLikeBase64 ? `data:${mediaType};base64,${url}` : url; return { dataUrl, mediaType }; } /** * Extracts base64-only content from a data URL. * Returns empty string if the input is not a valid data URL. */ export function extractBase64FromDataUrl(dataUrl: string): string { if (!dataUrl.startsWith("data:")) { return ""; } const comma = dataUrl.indexOf(","); return comma >= 0 ? dataUrl.slice(comma + 1) : ""; } /** * Parses an image part (either image_url or file type) and returns * dataUrl, base64Only, and mediaType ready for rendering. * * Handles error cases gracefully by returning empty base64Only string * when parsing fails, allowing the renderer to skip invalid images. */ export function parseImagePartToDataUrl(part: any): { dataUrl: string; base64Only: string; mediaType: string; } { try { // Handle image_url parts if (part.type === "image_url" && part.image_url?.url) { const url = part.image_url.url; const mediaType = "image/png"; // Default for image_url parts if (url.startsWith("data:")) { // Extract media type from data URL if present const match = url.match(/data:([^;]+)/); const extractedMediaType = match?.[1] || mediaType; return { dataUrl: url, base64Only: extractBase64FromDataUrl(url), mediaType: extractedMediaType, }; } return { dataUrl: url, base64Only: "", mediaType, }; } // Handle file parts (AI SDK format) if (part.type === "file") { const { dataUrl, mediaType } = parseImageFile(part); return { dataUrl, base64Only: extractBase64FromDataUrl(dataUrl), mediaType, }; } return { dataUrl: "", base64Only: "", mediaType: "image/png", }; } catch { return { dataUrl: "", base64Only: "", mediaType: "image/png", }; } } ``` ## Image Configuration You can customize the generated image using the optional `image_config` parameter (for chat completions) or `size`/`quality`/`style` parameters (for the images API). The supported parameters vary by provider. ### Google Models Available Google models: | Model | Description | | -------------------------------- | ----------------------------------------------------------------------------------- | | `gemini-3-pro-image-preview` | Gemini 3 Pro with native image generation. Supports aspect ratios and 1K–4K sizes. | | `gemini-3.1-flash-image-preview` | Gemini 3.1 Flash with native image generation. Supports 0.5K–4K sizes (default 1K). | #### gemini-3-pro-image-preview ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-3-pro-image-preview", "messages": [ { "role": "user", "content": "Generate an image of a mountain landscape at sunset" } ], "image_config": { "aspect_ratio": "16:9", "image_size": "4K" } }' ``` | Parameter | Type | Description | | -------------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------------- | | `aspect_ratio` | string | The aspect ratio of the generated image. Options: `"1:1"`, `"2:3"`, `"3:2"`, `"3:4"`, `"4:3"`, `"4:5"`, `"5:4"`, `"9:16"`, `"16:9"`, `"21:9"` | | `image_size` | string | The resolution of the generated image. Options: `"1K"` (1024x1024), `"2K"` (2048x2048), `"4K"` (4096x4096) | #### gemini-3.1-flash-image-preview ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-3.1-flash-image-preview", "messages": [ { "role": "user", "content": "Generate an image of a mountain landscape at sunset" } ], "image_config": { "image_size": "1K" } }' ``` | Parameter | Type | Description | | -------------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `aspect_ratio` | string | The aspect ratio of the generated image. Options: `"1:1"`, `"1:4"`, `"1:8"`, `"2:3"`, `"3:2"`, `"3:4"`, `"4:1"`, `"4:3"`, `"4:5"`, `"5:4"`, `"8:1"`, `"9:16"`, `"16:9"`, `"21:9"` | | `image_size` | string | The resolution of the generated image. Options: `"0.5K"` (512x512), `"1K"` (1024x1024, default), `"2K"` (2048x2048), `"4K"` (4096x4096) | `gemini-3.1-flash-image-preview` uniquely supports `"0.5K"` resolution, which is not available on other Google image models. ### Alibaba Models ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "alibaba/qwen-image-plus", "messages": [ { "role": "user", "content": "Generate an image of a mountain landscape at sunset" } ], "image_config": { "image_size": "1024x1536", "n": 1, "seed": 42 } }' ``` | Parameter | Type | Description | | ------------ | ------- | ------------------------------------------------------------------------------------------------ | | `image_size` | string | Image dimensions in `WIDTHxHEIGHT` format. Examples: `"1024x1024"`, `"1024x1536"`, `"1536x1024"` | | `n` | integer | Number of images to generate (1-4) | | `seed` | integer | Random seed for reproducible generation | Available Alibaba models: | Model | Price | Description | | ------------------------- | ------------ | --------------------------------- | | `alibaba/qwen-image` | $0.035/image | Standard quality image generation | | `alibaba/qwen-image-plus` | $0.03/image | Good balance of quality and cost | | `alibaba/qwen-image-max` | $0.075/image | Highest quality image generation | Alibaba models use explicit pixel dimensions (e.g., `"1024x1536"`) instead of aspect ratios. For portrait orientation use `"1024x1536"`, for landscape use `"1536x1024"`. ### Z.AI Models ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "zai/cogview-4", "messages": [ { "role": "user", "content": "Generate an image of a futuristic city skyline" } ], "image_config": { "image_size": "1024x1024" } }' ``` | Parameter | Type | Description | | ------------ | ------- | ------------------------------------------------------------------------------------------------ | | `image_size` | string | Image dimensions in `WIDTHxHEIGHT` format. Examples: `"1024x1024"`, `"2048x1024"`, `"1024x2048"` | | `n` | integer | Number of images to generate | Available Z.AI models: | Model | Price | Description | | --------------- | ------------ | ------------------------------------------------------------------------------------------------------------------- | | `zai/cogview-4` | $0.01/image | CogView-4 with bilingual support and excellent text rendering | | `zai/glm-image` | $0.015/image | GLM-Image with hybrid auto-regressive architecture, excellent for text-rendering and knowledge-intensive generation | CogView-4 supports both Chinese and English prompts and excels at generating images with embedded text. ### ByteDance Models ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "bytedance/seedream-4-5", "messages": [ { "role": "user", "content": "Generate an image of a futuristic cyberpunk city at night" } ], "image_config": { "image_size": "2048x2048" } }' ``` | Parameter | Type | Description | | ------------ | ------ | ------------------------------------------------------------------------------------------------ | | `image_size` | string | Image dimensions in `WIDTHxHEIGHT` format. Examples: `"1024x1024"`, `"2048x2048"`, `"4096x4096"` | Available ByteDance models: | Model | Price | Description | | ------------------------ | ------------ | --------------------------------------------------------------- | | `bytedance/seedream-4-0` | $0.035/image | High-quality text-to-image generation with 2K default output | | `bytedance/seedream-4-5` | $0.045/image | Enhanced quality and consistency with improved prompt adherence | Seedream models support up to 2-10 reference images for multi-image fusion and generation. The default output resolution is 2048Γ—2048 (2K), with support up to 4096Γ—4096 (4K). ## Usage Notes Image generation models typically have higher token costs compared to text-only models due to the computational requirements of image synthesis. Generated images are returned as base64-encoded data URLs, which can be large. Consider the payload size when integrating image generation into your applications. # Metadata URL: /features/metadata # Metadata LLM Gateway supports sending additional metadata with your requests using custom headers. This allows you to include information like user sessions, application versions, tenant IDs, or other contextual data that can be useful for analytics and monitoring. Later, you can filter by specific values to return, such as for a specific user or session. Additionally, in the future, you will be able to segment your analytics and monitoring based on this metadata. For example, you could show cost and latency breakdowns per user, application, country, feature, or any other dimension you want to track. ## Custom Headers You can include custom headers with the `X-LLMGateway-` prefix to send metadata alongside your LLM requests: ```bash curl -X POST https://api.llmgateway.io/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "X-LLMGateway-Country: US" \ -H "X-LLMGateway-User-ID: 9403f741-a524-4b18-b1b2-dbb71cdff2a4" \ -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": "Hello, how are you?" } ] }' ``` ## Best Practices ### Header Naming * Use the `X-LLMGateway-` prefix for all custom metadata * Use descriptive, consistent naming conventions * Avoid special characters; use hyphens to separate words ### Data Privacy * Be mindful of sensitive data in headers * Consider hashing or anonymizing user identifiers * Follow your organization's data privacy policies ### Performance * Keep header values reasonably short * Avoid sending unnecessary metadata that won't be used for analytics * Consider the impact on request size, especially for high-volume applications ## Example: Multi-tenant Application For a multi-tenant application, you might use metadata headers like this: ```bash curl -X POST https://api.llmgateway.io/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "X-LLMGateway-Tenant-ID: acme-corp" \ -H "X-LLMGateway-User-ID: user-12345" \ -H "X-LLMGateway-App-Version: 2.1.4" \ -H "X-LLMGateway-Feature: chat-assistant" \ -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": "Summarize this document..." } ] }' ``` This allows you to track usage and costs per tenant, user, application version, and feature, providing detailed insights into how your LLM integration is being used across your platform. # Reasoning URL: /features/reasoning import { Callout } from "fumadocs-ui/components/callout"; # Reasoning LLMGateway supports reasoning-capable models that can show their step-by-step thought process before providing a final answer. This feature is particularly useful for complex problem-solving tasks, mathematical calculations, and logical reasoning. ## Reasoning-Enabled Models You can find all reasoning-enabled models on our [models page with reasoning filter](https://llmgateway.io/models?filters=1\&reasoning=true). These models include: * OpenAI's GPT-5 series (e.g., `gpt-5`, `gpt-5-mini`) * Note: GPT-5 models use reasoning but currently do not return the reasoning content in the response. * Anthropic's Claude 3.7 Sonnet * Google's Gemini 2.0 Flash Thinking and Gemini 2.5 Pro * GPT OSS models such as `gpt-oss-120b` and `gpt-oss-20b` * Z.AI's reasoning models Some models may reason internally even if the `reasoning_effort` parameter is not specified. ## Using the Reasoning Parameter There are two ways to control reasoning effort: ### Option 1: Top-level `reasoning_effort` Add the `reasoning_effort` parameter directly to your request: * `minimal` - Fastest reasoning with minimal thought process (only for GPT-5 models) * `low` - Light reasoning for simpler tasks * `medium` - Balanced reasoning for most tasks * `high` - Deep reasoning for complex problems * `xhigh` - Maximum reasoning depth for the most complex problems ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-oss-120b", "messages": [ { "role": "user", "content": "What is 2/3 + 1/4 + 5/6?" } ], "reasoning_effort": "medium" }' ``` ### Option 2: Using the `reasoning` object Use the unified `reasoning` configuration object with an `effort` field: * `none` - Disable reasoning * `minimal` - Fastest reasoning with minimal thought process * `low` - Light reasoning for simpler tasks * `medium` - Balanced reasoning for most tasks * `high` - Deep reasoning for complex problems * `xhigh` - Maximum reasoning depth for the most complex problems ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-5", "messages": [ { "role": "user", "content": "What is 2/3 + 1/4 + 5/6?" } ], "reasoning": { "effort": "medium" } }' ``` You cannot use both `reasoning_effort` and `reasoning.effort` in the same request. Choose one approach. However, you can combine `reasoning_effort` or `reasoning.effort` with `reasoning.max_tokens` β€” when `max_tokens` is specified, it takes priority over the effort level. ### Example Response The response will include a `reasoning` field in the message object containing the model's step-by-step thought process: ```json { "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1234567890, "model": "gpt-oss-120b", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The answer is 1.75 or 7/4.", "reasoning": "First, I need to find a common denominator for 2/3, 1/4, and 5/6. The LCD is 12. Converting: 2/3 = 8/12, 1/4 = 3/12, 5/6 = 10/12. Adding: 8/12 + 3/12 + 10/12 = 21/12 = 1.75 or 7/4." }, "finish_reason": "completed" } ], "usage": { "prompt_tokens": 20, "completion_tokens": 45, "reasoning_tokens": 35, "total_tokens": 65 } } ``` ## Specifying Reasoning Token Budget For models that support it, you can specify an exact token budget for reasoning using the `reasoning` object with `max_tokens`. This gives you precise control over how many tokens the model allocates to its thinking process. When `reasoning.max_tokens` is specified, it overrides `reasoning.effort` and `reasoning_effort`. Supported by Anthropic Claude and Google Gemini thinking models. ### Example Request ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4-20250514", "messages": [ { "role": "user", "content": "Explain the P vs NP problem and why it matters." } ], "reasoning": { "max_tokens": 8000 } }' ``` ### Supported Models The `reasoning.max_tokens` parameter is supported by: * **Anthropic Claude**: Claude 3.7 Sonnet, Claude Sonnet 4, Claude Opus 4, Claude Opus 4.5 * **Google Gemini**: Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 3 Pro Preview When using auto-routing or root models with `reasoning.max_tokens`, only providers that support this feature will be considered. ### Provider-Specific Constraints * **Anthropic**: Reasoning budget must be between 1,024 and 128,000 tokens. Values outside this range are automatically clamped. * **Google**: No specific constraints on the reasoning budget. ### Error Handling If you specify `reasoning.max_tokens` for a model that doesn't support it, you'll receive an error: ```json { "error": { "message": "Model gpt-4o does not support reasoning.max_tokens. Remove the reasoning parameter or use a model that supports explicit reasoning token budgets.", "type": "invalid_request_error", "code": "model_not_supported" } } ``` ## Streaming Reasoning Content When streaming is enabled, reasoning content will be streamed as part of the response chunks: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-oss-120b", "messages": [ { "role": "user", "content": "Solve this logic puzzle: If all roses are flowers and some flowers fade quickly, can we conclude that some roses fade quickly?" } ], "reasoning_effort": "high", "stream": true }' ``` The reasoning content will appear in the stream chunks before the final answer, allowing you to display the model's thought process in real-time. Example: ``` data: { "id": "chatcmpl-fb266880-1016-4797-9a70-f21a538edaf6", "object": "chat.completion.chunk", "created": 1761048126, "model": "openai/gpt-oss-20b", "choices": [ { "index": 0, "delta": { "reasoning": "It's ", "role": "assistant" }, "finish_reason": null } ] } ``` ## Usage Tracking ### Response Payload The `usage` object in the response includes reasoning-specific token counts: * `reasoning_tokens` - Number of tokens used for the reasoning process * `completion_tokens` - Number of tokens in the final answer * `prompt_tokens` - Number of tokens in the input * `total_tokens` - Sum of all token counts ### Logs and Analytics All requests using the `reasoning_effort` parameter are tracked in your dashboard logs with: * The `reasoningContent` field containing the full reasoning text * Separate token counts for reasoning vs. completion * Performance metrics for reasoning-enabled requests You can view detailed logs for each request in the [dashboard](https://llmgateway.io/dashboard) to analyze how models are reasoning through problems. ## Auto-Routing with Reasoning When using auto-routing (specifying a model like `gpt-5` without a specific version), LLMGateway will: 1. Automatically set `reasoning_effort` to `minimal` for GPT-5 models 2. Set `reasoning_effort` to `low` for other auto-routed reasoning models 3. Only route to providers that support reasoning when `reasoning_effort` is specified This ensures optimal performance and cost when using auto-routing with reasoning-capable models. ## Model-Specific Behavior Not all reasoning models return reasoning content in the same way. Some models (like OpenAI models) may reason internally but not expose the reasoning content in the response. LLMGateway makes sure the response is unified across different providers, but the depth and format of reasoning may vary. ## Best Practices 1. **Choose appropriate reasoning effort**: Use `low` or `minimal` for simple tasks, `medium` for most tasks, and `high` only for complex problems that require deep reasoning 2. **Monitor token usage**: Reasoning can significantly increase token consumption - monitor your `reasoning_tokens` in the usage object 3. **Stream for better UX**: When building user-facing applications, enable streaming to show the reasoning process in real-time 4. **Check logs**: Review the `reasoningContent` in your dashboard logs to understand how models are solving problems ## Error Handling If you specify `reasoning_effort` for a model that doesn't support reasoning, you'll receive an error: ```json { "error": { "message": "Model gpt-4o does not support reasoning. Remove the reasoning_effort parameter or use a reasoning-capable model.", "type": "invalid_request_error", "code": "model_not_supported" } } ``` To avoid this error, only use the `reasoning_effort` parameter with [reasoning-enabled models](https://llmgateway.io/models?filters=1\&reasoning=true). # Response Healing URL: /features/response-healing import { Callout } from "fumadocs-ui/components/callout"; # Response Healing Response Healing is a plugin that automatically validates and repairs malformed JSON responses from AI models. When enabled, LLM Gateway ensures that API responses conform to your specified schemas even when the model's formatting is imperfect. ## Why Response Healing? Large language models occasionally produce invalid JSON, especially in complex scenarios: * **Markdown wrapping**: Models often wrap JSON in code blocks like \`\`\`json...\`\`\` * **Mixed content**: JSON may be preceded or followed by explanatory text * **Syntax errors**: Trailing commas, unquoted keys, or single quotes instead of double quotes * **Truncated output**: Token limits may cut off responses mid-JSON Response Healing automatically detects and fixes these issues, saving you from implementing error handling for every possible malformed response. ## Enabling Response Healing To enable Response Healing, add `response-healing` to the `plugins` array in your request: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Return a JSON object with name and age"}], "response_format": {"type": "json_object"}, "plugins": [{"id": "response-healing"}] }' ``` Response Healing only activates when `response_format` is set to `json_object` or `json_schema`. For regular text responses, the plugin has no effect. ## How It Works When Response Healing is enabled, LLM Gateway applies a series of repair strategies to malformed JSON responses: ### 1. Markdown Extraction Extracts JSON from markdown code blocks: ```text Here's the data: \`\`\`json {"name": "Alice", "age": 30} \`\`\` ``` Becomes: ```json { "name": "Alice", "age": 30 } ``` ### 2. Mixed Content Extraction Separates JSON from surrounding text: ```text Sure! Here is the JSON you requested: {"name": "Alice", "age": 30} Let me know if you need anything else. ``` Becomes: ```json { "name": "Alice", "age": 30 } ``` ### 3. Syntax Fixes Repairs common JSON syntax violations: | Issue | Before | After | | --------------- | ------------------- | ------------------- | | Trailing commas | `{"a": 1,}` | `{"a": 1}` | | Unquoted keys | `{name: "Alice"}` | `{"name": "Alice"}` | | Single quotes | `{'name': 'Alice'}` | `{"name": "Alice"}` | ### 4. Truncation Completion Adds missing closing brackets for truncated responses: ```text {"name": "Alice", "data": {"nested": true ``` Becomes: ```json { "name": "Alice", "data": { "nested": true } } ``` ## Usage Examples ### With JSON Object Format Request a structured response with automatic healing: ```typescript const response = await fetch("https://api.llmgateway.io/v1/chat/completions", { method: "POST", headers: { Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "gpt-4o", messages: [ { role: "user", content: "Return a JSON object with fields: name (string) and age (number)", }, ], response_format: { type: "json_object" }, plugins: [{ id: "response-healing" }], }), }); const result = await response.json(); // Response is guaranteed to be valid JSON const data = JSON.parse(result.choices[0].message.content); ``` ### With JSON Schema For stricter validation, combine with `json_schema`: ```typescript const response = await fetch("https://api.llmgateway.io/v1/chat/completions", { method: "POST", headers: { Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "gpt-4o", messages: [ { role: "user", content: "Generate a user profile", }, ], response_format: { type: "json_schema", json_schema: { name: "user_profile", schema: { type: "object", required: ["name", "email"], properties: { name: { type: "string" }, email: { type: "string" }, age: { type: "number" }, }, }, }, }, plugins: [{ id: "response-healing" }], }), }); const result = await response.json(); ``` ## Healing Metadata When a response is healed, the healing method is logged for debugging. The following healing methods may be applied: | Method | Description | | -------------------------- | ------------------------------------------- | | `markdown_extraction` | JSON extracted from markdown code blocks | | `mixed_content_extraction` | JSON extracted from surrounding text | | `syntax_fix` | Trailing commas, quotes, or keys were fixed | | `truncation_completion` | Missing closing brackets were added | | `combined_strategies` | Multiple strategies were applied | ## Limitations Response Healing is only available for non-streaming requests. Streaming responses are returned as-is without healing. Response Healing works best for: * Simple to moderately complex JSON structures * Common formatting issues from LLMs It may not be able to repair: * Severely corrupted or nonsensical output * Complex nested structures with multiple issues * Responses that don't contain any recognizable JSON ## Best Practices ### Use with Structured Prompts Combine Response Healing with clear instructions for best results: ```typescript const response = await fetch("https://api.llmgateway.io/v1/chat/completions", { method: "POST", headers: { Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "gpt-4o", messages: [ { role: "system", content: "Always respond with valid JSON. No explanations.", }, { role: "user", content: "List three colors as a JSON array", }, ], response_format: { type: "json_object" }, plugins: [{ id: "response-healing" }], }), }); const result = await response.json(); ``` ### Validate Critical Data For critical applications, validate the healed JSON in your code: ```typescript const result = await response.json(); const content = result.choices[0].message.content; const data = JSON.parse(content); // Add your own validation if (!data.name || typeof data.name !== "string") { throw new Error("Invalid response: missing name"); } ``` ### Monitor Healing Rates If you notice frequent healing in your logs, consider: * Improving your prompts to request cleaner JSON * Using models with better JSON output (e.g., GPT-4o, Claude 3.5) * Adding explicit JSON examples in your prompts # Routing URL: /features/routing import { Callout } from "fumadocs-ui/components/callout"; # Routing LLMGateway provides flexible and intelligent routing options to help you get the best performance and cost efficiency from your AI applications. Whether you want to use specific models, providers, or let our system automatically optimize your requests, we've got you covered. LLMGateway also includes **automatic retry and fallback** β€” if a provider fails, your request is seamlessly retried on the next best provider, all within the same API call. ## Model Selection ### Any Model Name You can use any model name from our [models page](https://llmgateway.io/models) or discover available models programmatically through the [/v1/models endpoint](/v1_models). ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }' ``` ### Model ID Routing Choose a specific model ID to route to the **best available provider** for that model. LLMGateway's smart routing algorithm considers multiple factors to find the optimal provider across all configured options. #### Smart Routing Algorithm When you use a model ID without a provider prefix, LLMGateway's intelligent routing system analyzes multiple factors to select the best provider: **Weighted Scoring System** (based on last 5 minutes of metrics): * **Uptime (50%)** - Prioritizes providers with high reliability and low error rates * **Throughput (20%)** - Favors providers with higher tokens per second generation speed * **Price (20%)** - Considers cost efficiency while maintaining quality * **Latency (10%)** - Considers time to first token (only applied for streaming requests) The algorithm calculates a weighted score for each available provider and selects the one with the lowest (best) score. All metrics are normalized to ensure fair comparison across providers. **Latency Weight for Non-Streaming Requests**: For non-streaming requests, the latency weight (10%) is redistributed proportionally to the other factors since time-to-first-token is less relevant when waiting for the complete response. **Exponential Uptime Penalty**: Providers with uptime below 95% receive an additional exponential penalty that increases rapidly as uptime drops: * 95-100% uptime: No penalty * 90% uptime: \~0.07 penalty * 80% uptime: \~0.62 penalty * 70% uptime: \~1.73 penalty * 50% uptime: \~5.61 penalty This ensures providers experiencing significant issues are strongly deprioritized while minor fluctuations have minimal impact. **Epsilon-Greedy Exploration** (1% of requests): To solve the "cold start problem" where new or unused providers never get traffic to build up metrics, the system randomly explores different providers 1% of the time. This ensures: * All providers periodically receive traffic * New providers can prove their reliability * The system adapts to changing provider performance * You benefit from improved routing decisions over time **Routing Metadata**: Every request includes detailed routing metadata in the logs, showing: * Available providers that were considered * Selected provider and selection reason * Scores for each provider (including uptime, throughput, latency, and price) This transparency allows you to understand and debug routing decisions. Using model IDs without a provider prefix automatically routes to the optimal provider based on reliability, speed, and cost. The system continuously learns and adapts based on real-time performance metrics. Smart routing prioritizes reliability over cost, ensuring your requests are routed to providers with proven uptime and performance, while still considering cost efficiency. ### Provider-Specific Routing To use a specific provider without any fallbacks, prefix the model name with the provider name followed by a slash: ```bash # Use OpenAI specifically curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }' # Use DeepSeek provider specifically curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek/deepseek-v3.2", "messages": [{"role": "user", "content": "Hello!"}] }' ``` #### Low-Uptime Protection When you specify a provider explicitly, LLMGateway checks the provider's recent uptime (last 5 minutes). If the uptime falls below 90%, the system automatically routes your request to the best available alternative provider to ensure reliability. This protects your application from providers experiencing temporary issues. If the requested provider has low uptime but no alternative providers are available for that model, the request will still be sent to the originally requested provider. #### Disabling Fallback with X-No-Fallback Header If you need to bypass this protection and always use the exact provider you specified regardless of its current uptime, you can use the `X-No-Fallback` header: ```bash # Force use of a specific provider even if it has low uptime curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -H "X-No-Fallback: true" \ -d '{ "model": "openai/gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }' ``` Using `X-No-Fallback: true` disables automatic provider failover. Your requests will be sent to the specified provider even if it is experiencing issues, which may result in higher error rates. When the `X-No-Fallback` header is used, the routing metadata in logs will include `noFallback: true` to indicate that fallback was disabled for that request. ## Automatic Retry & Fallback When using model ID routing (without a provider prefix), LLMGateway automatically retries failed requests on alternate providers. This happens transparently within the same API call β€” your application receives the successful response as if nothing went wrong. ### How Retry Works 1. Your request is routed to the best available provider using the smart routing algorithm 2. If that provider returns a server error (5xx), times out, or has a connection failure, the gateway marks the provider as failed 3. The next best available provider is selected and the request is retried 4. Up to **2 retries** are attempted before returning an error to the client ``` Request β†’ Provider A (500 error) β†’ Provider B (200 OK) β†’ Response ``` Both streaming and non-streaming requests support automatic retry. ### What Triggers a Retry Retries are triggered by **server-side failures** only: * **5xx errors** (500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, etc.) * **Timeouts** (upstream provider took too long to respond) * **Connection failures** (network errors, DNS failures, etc.) Retries are **not** triggered by: * **4xx client errors** (400 Bad Request, 401 Unauthorized, 403 Forbidden, 422 Unprocessable Entity) * **Content filter responses** (Azure ResponsibleAI, etc.) ### When Retry Is Disabled Automatic retry is disabled when: * The `X-No-Fallback: true` header is set * A specific provider is requested (e.g., `openai/gpt-4o`) * No alternative providers are available for the requested model * The maximum retry count (2) has been exhausted ### Routing Transparency Every provider attempt β€” both failed and successful β€” is recorded in the `routing` array in the response metadata and activity logs: ```json { "metadata": { "routing": [ { "provider": "openai", "model": "gpt-4o", "status_code": 500, "error_type": "server_error", "succeeded": false }, { "provider": "azure", "model": "gpt-4o", "status_code": 200, "error_type": "none", "succeeded": true } ] } } ``` ### Retried Log Tracking Each provider attempt creates its own log entry. Failed attempts that were retried are marked with: * **`retried: true`** β€” indicates this failed request was retried on another provider * **`retriedByLogId`** β€” the ID of the final successful log entry This allows you to distinguish between unrecovered failures and failures that were transparently recovered via retry. In the dashboard, retried logs display a "Retried" badge with a link to the successful log. ### Impact on Provider Health Failed attempts still count against the provider's uptime score, even when the request was successfully retried on another provider. This means: * A provider that keeps failing will see its uptime score drop * The exponential uptime penalty kicks in below 95% (see [Smart Routing Algorithm](#smart-routing-algorithm)) * Future requests are automatically routed away from unreliable providers * Your application stays reliable without any code changes on your side Automatic retry and fallback works together with smart routing to provide self-healing behavior. Failing providers are automatically avoided, and your requests are transparently recovered on reliable alternatives. ## Optimized Auto Routing Auto routing automatically selects the best model for your specific use case without you having to specify a model at all. ### Current Implementation The auto routing system currently: * **Chooses cost-effective models** by default for optimal price-to-performance ratio * **Automatically scales to more powerful models** based on your request's context size * **Handles large contexts intelligently** by selecting models with appropriate context windows ```bash # Let LLMGateway choose the optimal model curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [{"role": "user", "content": "Your request here..."}] }' ``` ### Free Models Only When using auto routing, you can restrict the selection to only free models (models with zero input and output pricing) by setting the `free_models_only` parameter to `true`: ```bash # Auto route to free models only curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [{"role": "user", "content": "Hello!"}], "free_models_only": true }' ``` Adding even a small amount of credits to your account (e.g., $5) will immediately upgrade your free model rate limits from 5 requests per 10 minutes to 20 requests per minute. The `free_models_only` parameter only works with auto routing (`"model": "auto"`). If no free models are available that meet your request requirements, the API will return an error. ### Reasoning models only Just specify the `reasoning_effort` value and only a model which supports reasoning will be chosen. This parameter is not specific to the auto model. ```bash # Auto route only to reasoning models curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [{"role": "user", "content": "Hello!"}], "reasoning_effort": "medium" }' ``` ### Exclude Reasoning Models When using auto routing, you can exclude reasoning models from selection by setting the `no_reasoning` parameter to `true`. This is useful when you want faster responses or need to avoid the additional cost and latency of reasoning models: ```bash # Auto route excluding reasoning models curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [{"role": "user", "content": "Hello!"}], "no_reasoning": true }' ``` The `no_reasoning` parameter only works with auto routing (`"model": "auto"`). If no non-reasoning models are available that meet your request requirements, the API will return an error. Auto routing analyzes your payload and automatically chooses between cost-effective models for simple requests and more powerful models for complex or large-context requests. ### Coming Soon: Advanced Optimization We're continuously improving our auto routing capabilities. Soon you'll benefit from: * **Tool call optimization**: Automatically select models that excel at function calling and structured outputs * **Content-aware routing**: Analyze message content to determine the best model for specific types of requests (coding, creative writing, analysis, etc.) * **Performance-based routing**: Route based on historical performance data for similar requests * **Multi-model orchestration**: Intelligently combine multiple models for complex workflows ### How It Works 1. **Request Analysis**: The system analyzes your request including message content, context size, and any special parameters 2. **Model Selection**: Based on the analysis, it selects the most appropriate model considering cost, performance, and capabilities 3. **Transparent Routing**: Your request is seamlessly routed to the chosen model and provider 4. **Optimized Response**: You receive the best possible response while maintaining cost efficiency Auto routing decisions are transparent in your usage logs, so you can always see which model was selected for each request. ## Best Practices ### For Development * Use specific model names during development and testing * Leverage auto routing for production workloads to optimize costs ### For Production * Use auto routing (`"model": "auto"`) for the best balance of cost and performance * Monitor your usage patterns through the dashboard to understand routing decisions * Set up provider keys for multiple providers to maximize routing options ### For Cost Optimization * Let auto routing handle model selection to automatically use the most cost-effective options * Use model IDs without provider prefixes to always get the cheapest available provider * Monitor your usage analytics to track cost savings from intelligent routing # Source Attribution URL: /features/source # Source Attribution The `X-Source` header allows you to identify your domain when making requests to LLM Gateway. This information is used to generate public usage statistics showing how LLM Gateway is being used across different websites and applications. ## X-Source Header Include the `X-Source` header with your domain name in your requests: ```bash curl -X POST https://api.llmgateway.io/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "X-Source: example.com" \ -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": "Hello, how are you?" } ] }' ``` ## Domain Format The `X-Source` header accepts domain names in various formats. All of the following are valid and will be normalized to the same domain: * `example.com` * `https://example.com` * `https://www.example.com` * `www.example.com` All variations will be stripped down to the base domain (`example.com`) for aggregation purposes. ## Public Statistics Data from the `X-Source` header is used to generate public statistics about LLM Gateway usage, including: * **Popular Domains**: Which websites and applications are using LLM Gateway most frequently * **Model Usage**: What models are being used by different domains * **Geographic Distribution**: Where requests are coming from across different sources * **Growth Trends**: How usage is growing over time for different domains These statistics help demonstrate the adoption and impact of LLM Gateway across the ecosystem. ## Privacy Considerations ### What's Public * Domain names (stripped of protocol and www prefixes) * Aggregated request counts and model usage * General geographic regions (country-level data) ### What's Private * Individual request content or responses * User identifiers or personal information * Detailed usage patterns beyond aggregated counts * API keys or authentication details ## Benefits Including the `X-Source` header provides several benefits: ### For Your Project * **Recognition**: Your domain will appear in public usage statistics * **Credibility**: Demonstrates real-world usage of your application * **Community**: Contributes to the broader LLM Gateway ecosystem ### For the Community * **Transparency**: Shows real adoption and usage patterns * **Inspiration**: Other developers can see successful implementations * **Growth**: Helps demonstrate the value of open-source LLM infrastructure ## Optional but Recommended While the `X-Source` header is optional, we strongly encourage its use to: * Support transparency in the LLM Gateway ecosystem * Help showcase successful integrations * Contribute to understanding of LLM usage patterns * Demonstrate the real-world impact of your application Your participation helps build a more transparent and collaborative LLM ecosystem. # Vision Support URL: /features/vision import { Callout } from "fumadocs-ui/components/callout"; # Vision Support LLMGateway supports vision-enabled models that can analyze and describe images. You can provide images via HTTPS URLs or inline base64-encoded data. ## Vision-Enabled Models You can find all vision-enabled models on our [models page with vision filter](https://llmgateway.io/models?filters=1\&vision=true). These models can process both text and image content in the same request. ## Image Formats ### Using HTTPS URLs You can provide any publicly accessible HTTPS URL pointing to an image: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What do you see in this image?" }, { "type": "image_url", "image_url": { "url": "https://example.com/image.jpg" } } ] } ] }' ``` ### Using Base64 Inline Data You can also provide images as base64-encoded data URIs: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image" }, { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEASABIAAD..." } } ] } ] }' ``` ## Content Array Format When using vision models, the `content` field should be an array containing both text and image content blocks: * **Text content**: `{"type": "text", "text": "Your message"}` * **Image content**: `{"type": "image_url", "image_url": {"url": "image_url_or_data_uri"}}` ## Multiple Images You can include multiple images in a single request: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Compare these two images" }, { "type": "image_url", "image_url": { "url": "https://example.com/image1.jpg" } }, { "type": "image_url", "image_url": { "url": "https://example.com/image2.jpg" } } ] } ] }' ``` ## Simple String Content For vision models, you can still use simple string content for text-only messages. The array format is only required when including images. ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": "Hello! How can you help me today?" } ] }' ``` ## Supported Image Types Vision models typically support common image formats including: * JPEG (.jpg, .jpeg) * PNG (.png) * WebP (.webp) * GIF (.gif) The specific formats supported may vary by model provider. Check the individual model documentation for format limitations and file size restrictions. ## Error Handling If an image URL is inaccessible or the image format is unsupported, the gateway will handle the error gracefully and may substitute a placeholder or error message in the request to the underlying model. # Native Web Search URL: /features/web-search import { Callout } from "fumadocs-ui/components/callout"; # Native Web Search LLM Gateway supports native web search capabilities that allow models to access real-time information from the internet. This feature is useful for answering questions about current events, recent news, live data, and other time-sensitive information that may not be in the model's training data. ## How It Works When you include the `web_search` tool in your request, the model can search the web to gather relevant information before generating a response: 1. You send a request with the `web_search` tool enabled 2. The model determines if web search is needed based on the query 3. If needed, the model performs web searches to gather current information 4. The model synthesizes the search results and generates a response 5. Citations are included in the response to show information sources ## Supported Providers Native web search is available on select models. See all models with native web search support on our [models page](https://llmgateway.io/models?filters=1\&webSearch=true). ## Basic Usage To enable web search, add the `web_search` tool to your request: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-5.2", "messages": [ { "role": "user", "content": "What is the current weather in San Francisco?" } ], "tools": [ { "type": "web_search" } ] }' ``` ### Example Response ```json { "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1234567890, "model": "openai/gpt-5.2", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The current weather in San Francisco is 57Β°F (14Β°C) with mostly cloudy skies...", "annotations": [ { "type": "url_citation", "url": "https://weather.com/...", "title": "San Francisco Weather" } ] }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 15, "completion_tokens": 150, "total_tokens": 165, "cost_usd_total": 0.0315 } } ``` ## Web Search Options The `web_search` tool accepts optional configuration parameters: ### User Location Provide location context to get more relevant local search results: ```json { "type": "web_search", "user_location": { "city": "San Francisco", "region": "California", "country": "US", "timezone": "America/Los_Angeles" } } ``` ### Search Context Size Control the amount of web content retrieved (OpenAI only): ```json { "type": "web_search", "search_context_size": "medium" } ``` Available values: * `low` - Minimal search context, faster responses * `medium` - Balanced context (default) * `high` - Maximum search context, more comprehensive ### Max Uses Limit the number of searches per request (provider-dependent): ```json { "type": "web_search", "max_uses": 3 } ``` ## Using with SDKs ### OpenAI SDK (Python) ```python from openai import OpenAI client = OpenAI( base_url="https://api.llmgateway.io/v1", api_key="your-api-key" ) response = client.chat.completions.create( model="gpt-5.2", messages=[ {"role": "user", "content": "What are the latest news headlines today?"} ], tools=[{"type": "web_search"}] ) print(response.choices[0].message.content) ``` ### OpenAI SDK (TypeScript) ```typescript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.llmgateway.io/v1", apiKey: "your-api-key", }); const response = await client.chat.completions.create({ model: "gpt-5.2", messages: [{ role: "user", content: "What are the latest tech news?" }], tools: [{ type: "web_search" }], }); console.log(response.choices[0].message.content); ``` ## Streaming Web search works with streaming responses. Citations are included in the final chunks: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-5.2", "messages": [ {"role": "user", "content": "What is the current stock price of Apple?"} ], "tools": [{"type": "web_search"}], "stream": true }' ``` ## Citations and Sources Web search responses include citations to show where information was sourced from. These appear in the `annotations` field of the message: ```json { "annotations": [ { "type": "url_citation", "url": "https://example.com/article", "title": "Article Title", "start_index": 0, "end_index": 50 } ] } ``` Citation format may vary slightly between providers, but LLM Gateway normalizes them into a consistent structure. ## Cost Tracking Web search costs are tracked separately from token costs in the usage object: ```json { "usage": { "prompt_tokens": 15, "completion_tokens": 150, "total_tokens": 165, "cost_usd_total": 0.0125, "cost_usd_input": 0.0015, "cost_usd_output": 0.01, "cost_usd_web_search": 0.01 } } ``` The `cost_usd_web_search` field shows the cost incurred specifically for web search queries. Web search is billed at $0.01 per search call for reasoning models (GPT-5, o-series) and $0.025 per call for non-reasoning models. ## Combining with Function Tools You can use web search alongside regular function tools: ```json { "tools": [ { "type": "web_search" }, { "type": "function", "function": { "name": "get_weather", "description": "Get weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string" } } } } } ] } ``` Some dedicated search models only support web search and do not support additional function tools. Use `gpt-5.2` or other GPT-5 series models if you need both web search and function tools. ## Use Cases ### Current Events and News ```json { "messages": [ { "role": "user", "content": "What are the major news stories today?" } ], "tools": [{ "type": "web_search" }] } ``` ### Real-Time Data ```json { "messages": [ { "role": "user", "content": "What is the current price of Bitcoin?" } ], "tools": [{ "type": "web_search" }] } ``` ### Research and Fact-Checking ```json { "messages": [ { "role": "user", "content": "What are the latest findings on climate change?" } ], "tools": [{ "type": "web_search" }] } ``` ### Local Information ```json { "messages": [ { "role": "user", "content": "What restaurants are open near me right now?" } ], "tools": [ { "type": "web_search", "user_location": { "city": "New York", "country": "US" } } ] } ``` ## Best Practices 1. **Use GPT-5.2**: For the best web search experience with full tool support, use `gpt-5.2` 2. **Provide location context**: When queries are location-dependent, include `user_location` for more relevant results 3. **Monitor costs**: Web search incurs per-query costs in addition to token costs 4. **Check citations**: Always review the citations in responses to verify information sources 5. **Use streaming**: For user-facing applications, enable streaming to show responses as they're generated ## Error Handling If you try to use web search with a model that doesn't support it: ```json { "error": { "message": "Model gpt-4o does not support native web search. Remove the web_search tool or use a model that supports it. See https://llmgateway.io/models?features=webSearch for supported models.", "type": "invalid_request_error" } } ``` To avoid this error, only use the `web_search` tool with [native web search enabled models](https://llmgateway.io/models?filters=1\&webSearch=true). # AWS Bedrock Integration URL: /integrations/aws-bedrock import { Step, Steps } from "fumadocs-ui/components/steps"; AWS Bedrock is Amazon's fully managed service that provides access to foundation models from leading AI companies. This guide shows how to create AWS Bedrock Long-Term API Keys and integrate them with LLM Gateway. ## Prerequisites * An AWS account with Bedrock access enabled * LLM Gateway account or self-hosted instance ## Overview AWS Bedrock supports **Long-Term API Keys** for simplified authentication. These keys provide direct API access without requiring IAM credentials or complex authentication flows. ## Create AWS Bedrock Long-Term API Key ### Enable Model Access in Bedrock 1. Log into the **AWS Console** 2. Navigate to **AWS Bedrock** service 3. Go to **Model access** in the left sidebar 4. Click **Manage model access** 5. Enable the models you want to use (e.g., Claude 3.5, Llama 3) 6. Wait for access to be granted (usually instant for most models) ### Create Long-Term API Key 1. In AWS Bedrock console, navigate to **API Keys** in the left sidebar 2. Click **Create Long-Term API Key** 3. Set expiry date ("Never expires" is recommended) 4. Click **Generate** 5. **Important**: Copy the API key immediately - it's only shown once! ## Add to LLM Gateway ### Navigate to Provider Keys 1. Log into [LLM Gateway Dashboard](https://llmgateway.io/dashboard) 2. Select your organization and project 3. Go to **Provider Keys** in the sidebar ### Add AWS Bedrock Provider Key 1. Click **Add** for **AWS Bedrock** 2. Paste your Long-Term API Key 3. **Select Region Prefix** based on where you want to use your models: * **us.** - For US regions (`us-east-1`, `us-west-2`) * **eu.** - For European regions (`eu-central-1`, `eu-west-1`) * **global.** - For global/cross-region endpoints 4. Click **Add Key** The system will validate your key and confirm the connection. ### Test the Integration Test your integration with a simple API call: ```bash curl -X POST https://api.llmgateway.io/v1/chat/completions \ -H "Authorization: Bearer YOUR_LLMGATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "aws-bedrock/claude-3-5-sonnet", "messages": [ { "role": "user", "content": "Hello from AWS Bedrock!" } ] }' ``` Replace `YOUR_LLMGATEWAY_API_KEY` with your LLM Gateway API key. ## Available Models Once configured, you can access all AWS Bedrock models through LLM Gateway: * **Anthropic Claude**: `aws-bedrock/claude-3-5-sonnet`, `aws-bedrock/claude-3-5-haiku` * **Meta Llama**: `aws-bedrock/llama-3-2-90b`, `aws-bedrock/llama-3-2-11b` * **Amazon Titan**: `aws-bedrock/amazon.titan-text-express-v1` * **And more...** Browse all available models at [llmgateway.io/models](https://llmgateway.io/models?provider=aws-bedrock) ## Troubleshooting ### "Model not available" error * Verify you've enabled model access in AWS Bedrock console * Check that the region where you created your key has access to the model * Some models are only available in specific regions ### Rate limiting * AWS Bedrock has request quotas per model and region * Monitor usage in AWS Bedrock console * Consider requesting quota increases for high-volume workloads # Azure Integration URL: /integrations/azure import { Step, Steps } from "fumadocs-ui/components/steps"; Azure provides access to OpenAI's powerful language models through Microsoft's enterprise cloud infrastructure. This guide shows how to create an Azure resource, deploy models, and integrate them with LLM Gateway. Only OpenAI models are supported via Azure at this time. [Open an issue](https://github.com/theopenco/llmgateway/issues/new) to request support for other model types. ## Prerequisites * An Azure account with an active subscription * LLM Gateway account or self-hosted instance ## Overview Azure provides enterprise-grade access to OpenAI models with enhanced security, compliance, and regional availability. LLM Gateway integrates seamlessly with Azure deployments. ## Create Azure Resource ### Create an Azure OpenAI Resource 1. Log into the **Azure Portal** ([https://portal.azure.com](https://portal.azure.com)) 2. Click **Create a resource** 3. Search for **Azure OpenAI** and select it 4. Click **Create** 5. Configure the resource: * **Subscription**: Select your Azure subscription * **Resource group**: Create new or select existing * **Region**: Choose a region (e.g., East US, West Europe) * **Name**: Enter a unique resource name (this will be your ``) * **Pricing tier**: Select Standard S0 6. Click **Review + create**, then **Create** 7. Wait for deployment to complete **Important**: Note your resource name - it will be used in the base URL: `https://.openai.azure.com` ### Deploy Models 1. Navigate to your Azure resource in the Azure Portal 2. Click **Go to Azure OpenAI Studio** or visit [https://oai.azure.com](https://oai.azure.com) 3. In Azure Studio, select **Deployments** from the left sidebar 4. Click **Create new deployment** 5. Configure your deployment: * **Model**: Select a model (e.g., gpt-4o, gpt-4o-mini, gpt-4-turbo) * **Deployment name**: Enter a name (this must match the model identifier you'll use – use the pre-filled name) * **Model version**: Select the latest version * **Deployment type**: Global Standard 6. Click **Create** 7. Repeat for additional models you want to use **Note**: The deployment name must match the expected model name: * For `gpt-4o-mini` β†’ deployment name should be `gpt-4o-mini` * For `gpt-35-turbo` β†’ deployment name should be `gpt-35-turbo` etc. ### Get API Key 1. In the Azure Portal, go to your Azure resource 2. Click **Keys and Endpoint** in the left sidebar 3. Copy **Key 1** or **Key 2** 4. Note your **Endpoint** URL (should be `https://.openai.azure.com`) **Important**: Keep your API key secure - it provides access to your Azure deployments. ## Add to LLM Gateway ### Navigate to Provider Keys 1. Log into [LLM Gateway Dashboard](https://llmgateway.io/dashboard) 2. Select your organization and project 3. Go to **Provider Keys** in the sidebar ### Add Azure Provider Key 1. Click **Add** for **Azure** 2. Enter your **API Key** from Azure Portal 3. Enter your **Resource Name** (the name from your Azure endpoint URL) * Example: If your endpoint is `https://my-openai-resource.openai.azure.com`, enter `my-openai-resource` 4. Select your preferred **type** (Azure OpenAI or AI Foundry) 5. Adapt the **Validation Model** to a model that you already deployed and is available This is a one time check to ensure the API key is valid and the model can be accessed. 6. Click **Add Key** The system will validate your key and confirm the connection. ### Test the Integration Test your integration with a simple API call: ```bash curl -X POST https://api.llmgateway.io/v1/chat/completions \ -H "Authorization: Bearer YOUR_LLMGATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "azure/gpt-4o-mini", "messages": [ { "role": "user", "content": "Hello from Azure!" } ] }' ``` Replace `YOUR_LLMGATEWAY_API_KEY` with your LLM Gateway API key. ## Available Models Once configured, you can access your Azure deployments through LLM Gateway: * **GPT-4o**: `azure/gpt-4o` * **GPT-4o Mini**: `azure/gpt-4o-mini` * **GPT-3.5 Turbo**: `azure/gpt-3.5-turbo` (note: use gpt-3.5-turbo as llmgateway model name instead of gpt-35-turbo) **Note**: Only models you have deployed in Azure Studio will be available. Ensure your deployment names match the expected model identifiers. Browse all available models at [llmgateway.io/models](https://llmgateway.io/models?provider=azure) ## Troubleshooting ### "Deployment not found" error * Verify you've created a deployment in Azure Studio * Ensure the deployment name exactly matches the model name you're requesting * Check that the deployment is in the same resource as your API key ### "Resource not found" error * Verify the resource name is correct (check your Azure Portal endpoint URL) * Ensure your API key belongs to the correct Azure resource * Confirm the resource is in an active state in Azure Portal ### Rate limiting * Azure has Tokens Per Minute (TPM) quotas per deployment * Monitor usage in Azure Studio under **Quotas** * Request quota increases through Azure Portal if needed for high-volume workloads ### Region availability * Not all models are available in all Azure regions * Check [Azure model availability](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#model-summary-table-and-region-availability) for your region * Consider creating resources in multiple regions for better availability # Activity URL: /learn/activity import { ThemedImage } from "@/components/themed-image"; The Activity page shows a real-time log of every API request routed through LLM Gateway. Use it to debug requests, monitor performance, and track costs per call. ## Filters Filter the activity log using the controls at the top: | Filter | Description | | --------------------------- | ------------------------------------------------------- | | **Time range** | Filter by a specific time period | | **Unified reasons** | Filter by completion reason (e.g., stop, length, error) | | **Providers** | Show requests for specific providers only | | **Models** | Show requests for specific models only | | **Custom header key/value** | Filter by custom metadata headers attached to requests | ## Activity List Each activity entry shows: * **Status icon** β€” Green checkmark for completed, red circle for errors * **Response preview** β€” First line of the model's response (when available) * **Model** β€” The provider and model used (e.g., `google-vertex/gemini-3-pro-image-preview`) * **Cache status** β€” Whether the response was served from cache * **Tokens** β€” Total tokens consumed (input + output) * **Duration** β€” How long the request took * **Cost** β€” Inference cost for the request * **Source** β€” Where the request originated from * **Discount** β€” Any discount applied (e.g., "20% off") * **Status badge** β€” `completed`, `upstream_error`, `gateway_error`, etc. * **Timestamp** β€” Relative time (e.g., "about 4 hours ago") ### Actions per Entry * **Open in new tab** β€” View the full request detail in a new browser tab * **Expand** β€” Expand inline to see more details ## Activity Detail Click on any activity entry to view its full detail page. ### Summary Cards Five cards at the top provide a quick overview: | Card | Description | | ------------------ | ------------------------------- | | **Duration** | Total request time in seconds | | **Tokens** | Total tokens consumed | | **Throughput** | Tokens per second | | **Inference Cost** | Cost charged for this request | | **Cache** | Whether the response was cached | ### Request Section Details about the original request: * **Requested Model** β€” The model ID sent in the API call * **Used Model** β€” The actual model that served the request * **Model Mapping** β€” The underlying model identifier * **Provider** β€” The provider that handled the request * **Requested Provider** β€” The provider specified in the request * **Streamed** β€” Whether the response was streamed * **Canceled** β€” Whether the request was canceled * **Source** β€” The application or service that made the request ### Tokens Section A detailed token breakdown: * Prompt Tokens, Completion Tokens, Total Tokens * Reasoning Tokens (for reasoning models) * Image Input/Output Tokens (for vision/image models) * Response Size ### Routing Section How LLM Gateway routed the request: * **Selection** β€” The routing strategy used (e.g., `direct-provider-specified`) * **Available** β€” Providers that were available for this model * **Provider Scores** β€” Scoring breakdown showing availability, uptime, and latency for each provider ### Parameters Section The model parameters sent with the request: * Temperature, Max Tokens, Top P * Frequency Penalty, Reasoning Effort * Response Format # API Keys URL: /learn/api-keys import { ThemedImage } from "@/components/themed-image"; The API Keys page lets you create, view, and manage the API keys used to authenticate requests to LLM Gateway. ## Creating an API Key Click the **Create API Key** button to generate a new key. The number of keys you can create depends on your plan: * **Free** β€” Limited number of keys * **Pro** β€” Higher key limit * **Enterprise** β€” Custom limits When creating a key, you can assign it a name to help identify its purpose (e.g., "Production", "Development", "CI/CD"). ## API Keys List Each key in the list shows: | Field | Description | | ------------- | -------------------------------------------------------------- | | **Name** | The label you assigned to the key | | **Key** | A masked preview of the key (only last few characters visible) | | **Created** | When the key was created | | **Last used** | When the key was last used in a request | ## Actions For each API key you can: * **View** β€” See the full key (only available once after creation) * **Edit** β€” Update the key name * **Rotate** β€” Generate a new key value while keeping the same configuration * **Delete** β€” Permanently remove the key ## Plan Limits The page shows your current key count vs. the maximum allowed by your plan. If you've reached your limit, the Create button will be disabled and you'll need to upgrade your plan or delete unused keys. # Audit Logs URL: /learn/audit-logs import { Callout } from "fumadocs-ui/components/callout"; import { ThemedImage } from "@/components/themed-image"; The Audit Logs page provides a complete history of all actions performed within your organization, essential for compliance and security monitoring. Audit Logs are available on the [**Enterprise plan**](https://llmgateway.io/enterprise). Owner or Admin role is required. ## Filters Narrow down the log entries: * **Action** β€” Filter by action type (create, delete, update, etc.) * **Resource type** β€” Filter by resource (API, IAM, API Keys, etc.) Both filters are populated dynamically based on the actions recorded in your organization. ## Audit Log Entries Each log entry shows: | Field | Description | | ----------------- | ------------------------------------------------------------ | | **Timestamp** | Exact time of the action (formatted as MMM d, yyyy HH:mm:ss) | | **User** | Name and email of the person who performed the action | | **Action** | What was done (e.g., "API Keys β†’ create") | | **Resource type** | The type of resource affected (shown as a badge) | | **Resource ID** | Identifier of the affected resource (with copy button) | | **Details** | Additional metadata about the action | ## Pagination The log supports infinite scrolling with a **Load More** button to view older entries. Entries are sorted newest first. # Billing URL: /learn/billing import { ThemedImage } from "@/components/themed-image"; The Billing page is your central hub for managing credits, plans, and payment methods. ## Credits Displays your current credit balance. Credits are consumed as you make API requests through the gateway. Click **Top Up Credits** to add more credits to your account. ## Plan Management View and manage your subscription: * See your current plan (Free, Pro, or Enterprise) * Billing cycle information * Click **Manage Subscription** to upgrade, downgrade, or cancel ## Payment Methods Manage your saved payment methods: * Add a new credit card or payment method * View existing payment methods * Update billing information ## Auto Top-up Settings Configure automatic credit top-ups so you never run out: * **Enable/disable** auto top-up * **Threshold** β€” The credit balance that triggers a top-up * **Amount** β€” How many credits to add when the threshold is reached This ensures uninterrupted service by automatically replenishing your credits when they run low. # Dashboard URL: /learn/dashboard import { ThemedImage } from "@/components/themed-image"; The Dashboard is the first page you see after logging in. It provides a high-level overview of your project's LLM usage, costs, and performance at a glance. ## Date Range At the top of the page, you can toggle the date range for all dashboard metrics: * **7 days** β€” Last 7 days of data (default) * **30 days** β€” Last 30 days of data * **Custom** β€” Pick a custom start and end date ## Stat Cards The dashboard displays eight metric cards in two rows: ### Top Row | Card | Description | | ------------------------ | ------------------------------------------------------------------------ | | **Organization Credits** | Your current available credit balance | | **Total Requests** | Number of API requests in the selected period, with cache hit percentage | | **Total Cost** | Total inference cost for the period, including storage costs | | **Total Savings** | Savings from discounts during the selected period | ### Bottom Row | Card | Description | | ------------------------ | ------------------------------------------------------------------- | | **Input Tokens & Cost** | Total prompt tokens sent and their associated cost | | **Output Tokens & Cost** | Total completion tokens received and their associated cost | | **Cached Tokens & Cost** | Tokens served from cache (if caching is enabled) and the cost saved | | **Most Used Model** | The model with the highest request count, along with its provider | ## Usage Overview Chart Below the stat cards, a chart visualizes your usage over time. You can toggle between two views using the dropdown: * **Costs** β€” Shows input, output, and cached input costs as a stacked area chart * **Requests** β€” Shows request volume over time The chart is filtered by the currently selected project. ## Quick Actions A sidebar panel provides shortcuts to common tasks: * **Manage API Keys** β€” Go to the API Keys page * **Provider Keys** β€” Configure your own provider keys * **View Activity** β€” See detailed request logs * **Usage & Metrics** β€” Dive into usage analytics * **Model Usage** β€” View per-model usage breakdown ## Header Actions Two buttons in the top-right corner: * **Create API Key** β€” Quickly create a new API key for your project * **Top Up Credits** β€” Add credits to your organization balance # Guardrails URL: /learn/guardrails import { Callout } from "fumadocs-ui/components/callout"; import { ThemedImage } from "@/components/themed-image"; The Guardrails page lets you configure content safety rules that automatically scan and filter API requests before they reach the LLM provider. Guardrails are available on the [**Enterprise plan**](https://llmgateway.io/enterprise). Owner or Admin role is required. ## Main Toggle A global toggle at the top enables or disables all guardrails for your organization. Click **Save Changes** to apply. ## System Rules Six built-in rules with individual enable/disable toggles: | Rule | Description | | ------------------------------- | -------------------------------------------------------------------- | | **Prompt Injection Detection** | Detects attempts to override or manipulate system instructions | | **Jailbreak Prevention** | Identifies attempts to bypass safety measures | | **PII Detection** | Identifies personal information like emails, phone numbers, and SSNs | | **Secrets Detection** | Detects API keys, passwords, and credentials | | **File Type Restrictions** | Controls which file types can be uploaded | | **Document Leakage Prevention** | Detects attempts to extract confidential documents | Each rule has an action dropdown to configure the response: * **Block** β€” Reject the request entirely * **Redact** β€” Remove or mask sensitive content, then continue * **Warn** β€” Log the violation but allow the request ## File Restrictions Configure file upload limits: * **Max file size** β€” Set the maximum file size in MB * **Allowed file types** β€” Add or remove permitted MIME types ## Custom Rules Create organization-specific rules by clicking **Add Rule**: * **Blocked Terms** β€” Block specific words or phrases * **Custom Regex** β€” Match patterns with regular expressions * **Topic Restriction** β€” Restrict content related to specific topics Each custom rule can be individually enabled/disabled or deleted. Learn more about guardrails in the [Guardrails feature docs](/features/guardrails). # Introduction URL: /learn The LLM Gateway dashboard gives you full control over your LLM API usage, costs, and configuration. This section walks you through every page in the dashboard so you can get the most out of the platform. ## Project Pages These pages are scoped to a specific project within your organization: * [**Dashboard**](/learn/dashboard) β€” Overview of your usage, costs, and quick actions * [**Activity**](/learn/activity) β€” Detailed logs of every API request * [**Model Usage**](/learn/model-usage) β€” Usage breakdown by model * [**Usage & Metrics**](/learn/usage-metrics) β€” Requests, errors, cache rates, and cost trends * [**API Keys**](/learn/api-keys) β€” Create and manage your API keys * [**Preferences**](/learn/preferences) β€” Project-level settings like caching and mode ## Organization Pages These pages apply to your entire organization: * [**Provider Keys**](/learn/provider-keys) β€” Bring your own provider API keys * [**Guardrails**](/learn/guardrails) β€” Content safety rules and filters * [**Security Events**](/learn/security-events) β€” Monitor guardrail violations * [**Billing**](/learn/billing) β€” Credits, plans, and payment methods * [**Transactions**](/learn/transactions) β€” Payment and credit history * [**Referrals**](/learn/referrals) β€” Earn credits by referring others * [**Policies**](/learn/policies) β€” Data retention configuration * [**Org Preferences**](/learn/org-preferences) β€” Organization name and billing email * [**Team**](/learn/team) β€” Manage team members and roles * [**Audit Logs**](/learn/audit-logs) β€” Complete history of organization actions ## Playground Interactive tools for testing and experimenting with LLM models: * [**Chat Playground**](/learn/playground) β€” Test models with an interactive chat interface * [**Group Chat**](/learn/playground-group) β€” Compare responses from multiple models side by side * [**Image Studio**](/learn/playground-image) β€” Generate and edit images using AI models # Model Usage URL: /learn/model-usage import { ThemedImage } from "@/components/themed-image"; The Model Usage page shows how your API requests are distributed across different LLM models over time. ## Filters Two filters let you narrow down the data: * **API Key** β€” Select a specific API key or view usage across all keys * **Date range** β€” Choose a time period to analyze ## Usage Chart The main chart displays a time-series breakdown of requests per model. Each model is represented by a different color, making it easy to see: * Which models are used most frequently * How usage patterns change over time * Whether usage is concentrated on a single model or spread across many This page is useful for understanding your model distribution and identifying opportunities to optimize costs by switching to more cost-effective models for certain workloads. # Org Preferences URL: /learn/org-preferences import { ThemedImage } from "@/components/themed-image"; The Org Preferences page contains basic settings for your organization. ## Organization Name Update your organization's display name. This name appears throughout the dashboard and in billing communications. ## Billing Email Set or update the email address used for billing-related communications, including receipts, invoices, and payment notifications. # Group Chat URL: /learn/playground-group import { ThemedImage } from "@/components/themed-image"; The Group Chat page lets you send a single prompt to multiple models simultaneously and compare their responses side by side. This is useful for evaluating model quality, speed, and cost. ## How It Works 1. Select two or more models from the model picker 2. Type your prompt in the input field 3. All selected models receive the same prompt at once 4. Responses stream in parallel, displayed in separate columns ## Use Cases * **Model evaluation** β€” Compare output quality across providers * **Cost optimization** β€” See which models give the best results for the price * **Speed comparison** β€” Observe latency differences between models * **Migration testing** β€” Verify that a new model produces equivalent results # Image Studio URL: /learn/playground-image import { ThemedImage } from "@/components/themed-image"; The Image Studio lets you generate images using AI models through an intuitive interface. Select a model, describe what you want, and get results instantly. ## Model Selection Choose from supported image generation models in the dropdown. Each model has different capabilities, resolutions, and pricing. ## Generating Images 1. Select an image generation model 2. Type a description of the image you want 3. Click send to generate 4. Generated images appear in the conversation ## Image Count You can generate 1, 2, or 4 images at once. Multiple images are displayed in a grid layout. ## Resolution Options Available resolutions depend on the selected model. Common options include 1K, 2K, and 4K. # Chat Playground URL: /learn/playground import { ThemedImage } from "@/components/themed-image"; The Chat Playground is a standalone app for testing LLM models through a conversational interface. You can select any supported model, adjust parameters, and see responses in real time. ## Model Selection Use the dropdown at the top to pick a model and provider. The **Auto Route** option automatically selects the best provider based on availability and cost. ## Chat Interface * Type your message in the input field at the bottom * Click the send button or press Enter to submit * Responses stream in real time * Previous conversations appear in the sidebar ## Prompt Suggestions When starting a new chat, category tabs help you pick a prompt: * **Create** β€” Content generation prompts * **Explore** β€” Research and analysis prompts * **Code** β€” Programming and development prompts * **Image gen** β€” Image generation prompts ## Sidebar The left sidebar shows your chat history. Click **+ New Chat** to start a fresh conversation, or select a previous chat to continue it. ## Comparison Mode Toggle **Comparison mode** in the top-right to send the same prompt to multiple models side by side. See the [Group Chat](/learn/playground-group) page for details. ## Image Studio Click **Image Studio** in the sidebar to switch to the image generation interface. See the [Image Studio](/learn/playground-image) page for details. # Policies URL: /learn/policies import { ThemedImage } from "@/components/themed-image"; The Policies page lets you configure organization-wide policies that govern how your data is handled. ## Data Retention Control how long your request logs and activity data are stored. The retention period depends on your plan: | Plan | Retention Period | | -------------- | ---------------- | | **Free** | 3 days | | **Pro** | 7 days | | **Enterprise** | 90 days | After the retention period expires, request logs and associated data are automatically deleted. Learn more about data retention in the [Data Retention feature docs](/features/data-retention). # Preferences URL: /learn/preferences import { ThemedImage } from "@/components/themed-image"; The Preferences page contains project-level settings that control how your project behaves. ## Project Name Update the display name for your project. This name appears in the sidebar and throughout the dashboard. ## Project Mode Configure how your organization handles projects. This setting determines the routing and isolation behavior for API requests within the project. ## Caching Enable or configure response caching for API requests. When enabled, identical requests will return cached responses instead of making new calls to the provider, saving both time and cost. Learn more about caching in the [Caching feature docs](/features/caching). ## Danger Zone The Danger Zone section contains irreversible actions: * **Archive Project** β€” Permanently archive the project. This action cannot be undone. Archived projects stop processing requests and their API keys become inactive. # Provider Keys URL: /learn/provider-keys import { Callout } from "fumadocs-ui/components/callout"; import { ThemedImage } from "@/components/themed-image"; The Provider Keys page lets you add your own API keys from LLM providers (OpenAI, Anthropic, Google, etc.) to route requests directly through your accounts without additional gateway fees. ## Adding a Provider Key Click **Add Provider Key** to configure a new key: * **Provider** β€” Select which provider this key belongs to * **Custom name** β€” An optional label to identify the key * **API key** β€” Your provider's API key * **Base URL** β€” Optional custom endpoint (useful for Azure OpenAI or custom deployments) ## Provider Keys List Each configured key shows: | Field | Description | | --------------- | -------------------------------------------------- | | **Provider** | The LLM provider (e.g., OpenAI, Anthropic) | | **Custom name** | Your label for the key | | **Status** | Active, inactive, or deleted | | **Base URL** | Custom endpoint if configured | | **Token** | Masked key with only the last 4 characters visible | ## Actions For each provider key: * **Edit** β€” Update the key name, value, or base URL * **Deactivate** β€” Temporarily disable the key without deleting it * **Delete** β€” Permanently remove the key When you use your own provider keys, requests are routed directly to the provider. You are only charged the provider's standard rates with no additional gateway markup. # Referrals URL: /learn/referrals import { ThemedImage } from "@/components/themed-image"; The Referrals page lets you earn credits by inviting others to use LLM Gateway. ## Eligibility To unlock the referral program, your organization must have at least **$100 in total credit top-ups**. Before reaching this threshold, the page shows: * A progress bar showing your progress toward $100 * The remaining amount needed to unlock * An explanation of the 1% earnings model ## Referral Dashboard Once eligible, the page shows: ### Your Referral Link A unique shareable link tied to your organization. Click the copy button to copy it to your clipboard and share it with others. ### Your Stats | Stat | Description | | ------------------ | ----------------------------------------------------- | | **Users Referred** | Total number of users who signed up through your link | | **Total Earnings** | Total credit amount earned from referrals | ### How It Works 1. **Share Your Link** β€” Send your referral link to others 2. **They Sign Up** β€” They create an LLM Gateway account using your link 3. **Earn Credits** β€” You earn 1% of their spending as credits Credits are automatically added to your organization balance. # Security Events URL: /learn/security-events import { Callout } from "fumadocs-ui/components/callout"; import { ThemedImage } from "@/components/themed-image"; The Security Events page shows all guardrail violations detected across your organization, helping you monitor content safety and policy enforcement. Security Events are available on the [**Enterprise plan**](https://llmgateway.io/enterprise). Owner or Admin role is required. ## Stats Cards Four summary cards at the top: | Card | Description | | -------------------- | --------------------------------------------- | | **Total Violations** | All-time violation count | | **Last 24 Hours** | Violations in the past day | | **Blocked** | Number of requests that were blocked | | **Redacted** | Number of requests where content was redacted | ## Filters Narrow down the events list: * **Action** β€” Filter by Blocked, Redacted, Warned, or All actions * **Category** β€” Filter by Prompt Injection, Jailbreak, PII Detection, Secrets, Blocked Terms, Custom Regex, or Topic Restriction ## Violations List Each violation entry shows: | Field | Description | | ------------------- | ---------------------------------------------------- | | **Timestamp** | When the violation occurred | | **Rule name** | Which guardrail rule was triggered | | **Category** | The type of violation (shown as a badge) | | **Action** | What action was taken (Blocked, Redacted, or Warned) | | **Matched pattern** | The content that triggered the rule | The list supports pagination with a **Load More** button for viewing older events. # Team URL: /learn/team import { ThemedImage } from "@/components/themed-image"; The Team page lets you invite team members, assign roles, and control access to your organization. ## Adding Members Click **Add Member** to invite someone by email. You'll need to: 1. Enter their email address 2. Select a role (Developer, Admin, or Owner) Your plan includes up to **5 team seats**. The current count is displayed, and the Add button is disabled when all seats are used. Contact sales for additional seats. ## Team Members List Each member shows: | Field | Description | | --------- | ------------------------------------------------ | | **Name** | The member's display name | | **Email** | Their email address | | **Role** | Their current role (can be changed via dropdown) | ## Actions * **Update role** β€” Change a member's role using the dropdown * **Remove** β€” Remove a member from the organization (requires confirmation) ## Role Permissions | Role | Permissions | | ------------- | ----------------------------------------------------------------------------------------------------- | | **Owner** | Full access to all settings, billing, team management, and all projects | | **Admin** | Can manage team members, projects, and API keys, but cannot access billing or delete the organization | | **Developer** | View and use resources only. Cannot modify settings or manage team | Developers can also be given **restricted access** at the API key level, limiting which keys they can view and use. # Transactions URL: /learn/transactions import { ThemedImage } from "@/components/themed-image"; The Transactions page shows a complete history of all financial transactions in your organization. ## Transaction History Each transaction entry includes: | Field | Description | | --------------- | ---------------------------------------- | | **Date** | When the transaction occurred | | **Type** | The transaction type (see below) | | **Credits** | Number of credits added or deducted | | **Total Paid** | The dollar amount charged | | **Status** | Current state of the transaction | | **Description** | Additional details about the transaction | ## Transaction Types | Type | Description | | ----------------------- | ----------------------------------- | | **Credit Top-up** | Manual or automatic credit purchase | | **Credit Refund** | Credits refunded to your account | | **Subscription Start** | New plan subscription started | | **Subscription Cancel** | Plan subscription canceled | | **Subscription End** | Plan subscription period ended | ## Status Badges * **Completed** β€” Transaction processed successfully * **Pending** β€” Transaction is being processed * **Failed** β€” Transaction could not be completed # Usage & Metrics URL: /learn/usage-metrics import { ThemedImage } from "@/components/themed-image"; The Usage & Metrics page provides comprehensive analytics through five tabs, giving you deep insight into your LLM API usage patterns. ## Filters * **API Key** β€” Filter metrics by a specific API key or view all * **Date range** β€” Select the time period (defaults to last 7 days) ## Tabs ### Requests A time-series chart showing request volume over the selected period. Use this to identify traffic patterns, peak usage times, and growth trends. ### Models A table showing your top-used models ranked by request count. For each model you can see: * Total requests * Token consumption * Associated costs This helps you understand which models drive the most usage and cost. ### Errors A chart showing error rates over time. Track: * Error frequency and trends * Spikes that may indicate provider issues * Overall reliability of your API calls ### Cache A chart showing your cache hit rate over time. Monitor: * How effectively caching is reducing redundant requests * Cache hit vs. miss ratios * The cost savings from cached responses ### Costs A cost breakdown chart showing spending patterns. Analyze: * Cost trends over time * Cost distribution by provider or model * Opportunities to reduce spending # Migrate from LiteLLM URL: /migrations/litellm import { Step, Steps } from "fumadocs-ui/components/steps"; import { Callout } from "fumadocs-ui/components/callout"; Running your own LiteLLM proxy worksβ€”until it doesn't. Scaling, monitoring, and keeping it running becomes another job. LLM Gateway gives you the same unified API with built-in analytics, caching, and a dashboardβ€”without the infrastructure overhead. ## Quick Migration Both services use OpenAI-compatible endpoints, so migration is a two-line change: ```diff - const baseURL = "http://localhost:4000/v1"; // LiteLLM proxy + const baseURL = "https://api.llmgateway.io/v1"; - const apiKey = process.env.LITELLM_API_KEY; + const apiKey = process.env.LLM_GATEWAY_API_KEY; ``` ## Why Teams Switch to LLM Gateway | What You Get | LiteLLM (Self-Hosted) | LLM Gateway | | ------------------------ | --------------------- | -------------------- | | OpenAI-compatible API | Yes | Yes | | Infrastructure to manage | Yes (you run it) | No (we run it) | | Managed cloud option | No | Yes | | Analytics dashboard | Basic | Per-request detail | | Response caching | Manual setup | Built-in, automatic | | Cost tracking | Via callbacks | Native, real-time | | Provider key management | Config file | Web UI with rotation | | Uptime & scaling | You handle it | 99.9% SLA (Pro/Ent) | Still want to self-host? LLM Gateway is [open source under AGPLv3](https://llmgateway.io/blog/how-to-self-host-llm-gateway)β€”same features, your infrastructure. For a detailed breakdown, see [LLM Gateway vs LiteLLM](https://llmgateway.io/compare/litellm). ## Migration Steps ### Get Your LLM Gateway API Key Sign up at [llmgateway.io/signup](https://llmgateway.io/signup) and create an API key from your dashboard. ### Map Your Models LLM Gateway supports two model ID formats: **Root Model IDs** (without provider prefix) - Uses smart routing to automatically select the best provider based on uptime, throughput, price, and latency: ``` gpt-5.2 claude-opus-4-5-20251101 gemini-3-flash-preview ``` **Provider-Prefixed Model IDs** - Routes to a specific provider with automatic failover if uptime drops below 90%: ``` openai/gpt-5.2 anthropic/claude-opus-4-5-20251101 google-ai-studio/gemini-3-flash-preview ``` This means many LiteLLM model names work directly with LLM Gateway: | LiteLLM Model | LLM Gateway Model | | -------------------------------- | ----------------------------------------------------------------- | | gpt-5.2 | gpt-5.2 or openai/gpt-5.2 | | claude-opus-4-5-20251101 | claude-opus-4-5-20251101 or anthropic/claude-opus-4-5-20251101 | | gemini/gemini-3-flash-preview | gemini-3-flash-preview or google-ai-studio/gemini-3-flash-preview | | bedrock/claude-opus-4-5-20251101 | claude-opus-4-5-20251101 or aws-bedrock/claude-opus-4-5-20251101 | For more details on routing behavior, see the [routing documentation](/features/routing). ### Update Your Code #### Python with OpenAI SDK ```python from openai import OpenAI # Before (LiteLLM proxy) client = OpenAI( base_url="http://localhost:4000/v1", api_key=os.environ["LITELLM_API_KEY"] ) response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "Hello!"}] ) # After (LLM Gateway) - model name can stay the same! client = OpenAI( base_url="https://api.llmgateway.io/v1", api_key=os.environ["LLM_GATEWAY_API_KEY"] ) response = client.chat.completions.create( model="gpt-4", # or "openai/gpt-4" to target a specific provider messages=[{"role": "user", "content": "Hello!"}] ) ``` #### Python with LiteLLM Library If you're using the LiteLLM library directly, you can point it to LLM Gateway: ```python import litellm # Before (direct LiteLLM) response = litellm.completion( model="gpt-4", messages=[{"role": "user", "content": "Hello!"}] ) # After (via LLM Gateway) - same model name works response = litellm.completion( model="gpt-4", # or "openai/gpt-4" to target a specific provider messages=[{"role": "user", "content": "Hello!"}], api_base="https://api.llmgateway.io/v1", api_key=os.environ["LLM_GATEWAY_API_KEY"] ) ``` #### TypeScript/JavaScript ```typescript import OpenAI from "openai"; // Before (LiteLLM proxy) const client = new OpenAI({ baseURL: "http://localhost:4000/v1", apiKey: process.env.LITELLM_API_KEY, }); // After (LLM Gateway) - same model name works const client = new OpenAI({ baseURL: "https://api.llmgateway.io/v1", apiKey: process.env.LLM_GATEWAY_API_KEY, }); const completion = await client.chat.completions.create({ model: "gpt-4", // or "openai/gpt-4" to target a specific provider messages: [{ role: "user", content: "Hello!" }], }); ``` #### cURL ```bash # Before (LiteLLM proxy) curl http://localhost:4000/v1/chat/completions \ -H "Authorization: Bearer $LITELLM_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}] }' # After (LLM Gateway) - same model name works curl https://api.llmgateway.io/v1/chat/completions \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}] }' # Use "openai/gpt-4" to target a specific provider ``` ### Migrate Configuration #### LiteLLM Config (Before) ```yaml # litellm_config.yaml model_list: - model_name: gpt-4 litellm_params: model: gpt-4 api_key: sk-... - model_name: claude-3 litellm_params: model: claude-3-sonnet-20240229 api_key: sk-ant-... ``` #### LLM Gateway (After) With LLM Gateway, you don't need a config file. Provider keys are managed in the web dashboard, or you can use the default LLM Gateway keys. If you want to use your own provider keys, configure them in the dashboard under Settings > Provider Keys. ## Streaming Support LLM Gateway supports streaming identically to LiteLLM: ```python from openai import OpenAI client = OpenAI( base_url="https://api.llmgateway.io/v1", api_key=os.environ["LLM_GATEWAY_API_KEY"] ) stream = client.chat.completions.create( model="openai/gpt-4", messages=[{"role": "user", "content": "Write a story"}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") ``` ## Function/Tool Calling LLM Gateway supports function calling: ```python from openai import OpenAI client = OpenAI( base_url="https://api.llmgateway.io/v1", api_key=os.environ["LLM_GATEWAY_API_KEY"] ) tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get the weather for a location", "parameters": { "type": "object", "properties": { "location": {"type": "string"} }, "required": ["location"] } } }] response = client.chat.completions.create( model="openai/gpt-4", messages=[{"role": "user", "content": "What's the weather in Tokyo?"}], tools=tools ) ``` ## Removing LiteLLM Infrastructure After verifying LLM Gateway works for your use case, you can decommission your LiteLLM proxy: 1. Update all clients to use LLM Gateway endpoints 2. Monitor the LLM Gateway dashboard for successful requests 3. Shut down your LiteLLM proxy server 4. Remove LiteLLM configuration files ## What Changes After Migration * **No servers to babysit** β€” We handle scaling, uptime, and updates * **Real-time cost visibility** β€” See what every request costs, broken down by model * **Automatic caching** β€” Repeated requests hit cache, reducing your spend * **Web-based management** β€” No more editing YAML files for config changes * **New models immediately** β€” Access new releases within 48 hours, no deployment needed ## Self-Hosting LLM Gateway If you prefer self-hosting like LiteLLM, LLM Gateway is available under AGPLv3: ```bash git clone https://github.com/llmgateway/llmgateway cd llmgateway pnpm install pnpm setup pnpm dev ``` This gives you the same benefits as LiteLLM's self-hosted proxy with LLM Gateway's analytics and caching features. ## Full Comparison Want to see a detailed breakdown of all features? Check out our [LLM Gateway vs LiteLLM comparison page](https://llmgateway.io/compare/litellm). ## Need Help? * Browse available models at [llmgateway.io/models](https://llmgateway.io/models) * Read the [API documentation](https://docs.llmgateway.io) * Contact support at [contact@llmgateway.io](mailto:contact@llmgateway.io) # Migrate from OpenRouter URL: /migrations/openrouter import { Step, Steps } from "fumadocs-ui/components/steps"; import { Callout } from "fumadocs-ui/components/callout"; LLM Gateway works just like OpenRouterβ€”same API format, same model namesβ€”but with built-in analytics and the option to self-host. Migration takes two lines of code. ## Quick Migration Change your base URL and API key: ```diff - const baseURL = "https://openrouter.ai/api/v1"; - const apiKey = process.env.OPENROUTER_API_KEY; + const baseURL = "https://api.llmgateway.io/v1"; + const apiKey = process.env.LLM_GATEWAY_API_KEY; ``` ## Migration Steps ### Get Your LLM Gateway API Key Sign up at [llmgateway.io/signup](https://llmgateway.io/signup) and create an API key from your dashboard. ### Update Environment Variables ```bash # Remove OpenRouter credentials # OPENROUTER_API_KEY=sk-or-... # Add LLM Gateway credentials LLM_GATEWAY_API_KEY=llmgtwy_your_key_here ``` ### Update Your Code #### Using fetch/axios ```typescript // Before (OpenRouter) const response = await fetch("https://openrouter.ai/api/v1/chat/completions", { method: "POST", headers: { Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "openai/gpt-5.2", messages: [{ role: "user", content: "Hello!" }], }), }); // After (LLM Gateway) const response = await fetch("https://api.llmgateway.io/v1/chat/completions", { method: "POST", headers: { Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "gpt-5.2", messages: [{ role: "user", content: "Hello!" }], }), }); ``` #### Using OpenAI SDK ```typescript import OpenAI from "openai"; // Before (OpenRouter) const client = new OpenAI({ baseURL: "https://openrouter.ai/api/v1", apiKey: process.env.OPENROUTER_API_KEY, }); // After (LLM Gateway) const client = new OpenAI({ baseURL: "https://api.llmgateway.io/v1", apiKey: process.env.LLM_GATEWAY_API_KEY, }); // Usage remains the same const completion = await client.chat.completions.create({ model: "anthropic/claude-3-5-sonnet-20241022", messages: [{ role: "user", content: "Hello!" }], }); ``` #### Using Vercel AI SDK Both OpenRouter and LLM Gateway have native AI SDK providers, making migration straightforward: ```typescript import { generateText } from "ai"; // Before (OpenRouter AI SDK Provider) import { createOpenRouter } from "@openrouter/ai-sdk-provider"; const openrouter = createOpenRouter({ apiKey: process.env.OPENROUTER_API_KEY, }); const { text } = await generateText({ model: openrouter("gpt-5.2"), prompt: "Hello!", }); // After (LLM Gateway AI SDK Provider) import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; const llmgateway = createLLMGateway({ apiKey: process.env.LLMGATEWAY_API_KEY, }); const { text } = await generateText({ model: llmgateway("gpt-5.2"), prompt: "Hello!", }); ``` ## Model Name Mapping Most model names are compatible, but here are some common mappings: | OpenRouter Model | LLM Gateway Model | | -------------------------------- | ----------------------------------------------------------------- | | openai/gpt-5.2 | gpt-5.2 or openai/gpt-5.2 | | gemini/gemini-3-flash-preview | gemini-3-flash-preview or google-ai-studio/gemini-3-flash-preview | | bedrock/claude-opus-4-5-20251101 | claude-opus-4-5-20251101 or aws-bedrock/claude-opus-4-5-20251101 | Check the [models page](https://llmgateway.io/models) for the full list of available models. ## Streaming Support LLM Gateway supports streaming responses identically to OpenRouter: ```typescript const stream = await client.chat.completions.create({ model: "anthropic/claude-3-5-sonnet-20241022", messages: [{ role: "user", content: "Write a story" }], stream: true, }); for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content || ""); } ``` ## Full Comparison Want to see a detailed breakdown of all features? Check out our [LLM Gateway vs OpenRouter comparison page](https://llmgateway.io/compare/open-router). ## Need Help? * Browse available models at [llmgateway.io/models](https://llmgateway.io/models) * Read the [API documentation](https://docs.llmgateway.io) * Contact support at [contact@llmgateway.io](mailto:contact@llmgateway.io) # Migrate from Vercel AI Gateway URL: /migrations/vercel-ai-gateway import { Step, Steps } from "fumadocs-ui/components/steps"; import { Callout } from "fumadocs-ui/components/callout"; ## Quick Migration Swap your provider importsβ€”your AI SDK code stays the same: ```diff - import { openai } from "@ai-sdk/openai"; - import { anthropic } from "@ai-sdk/anthropic"; + import { generateText } from "ai"; + import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; + const llmgateway = createLLMGateway({ + apiKey: process.env.LLM_GATEWAY_API_KEY + }); const { text } = await generateText({ - model: openai("gpt-5.2"), + model: llmgateway("gpt-5.2"), prompt: "Hello!" }); ``` The key difference: one provider, one API key, all modelsβ€”with caching and analytics built in. ## Migration Steps ### Get Your LLM Gateway API Key Sign up at [llmgateway.io/signup](https://llmgateway.io/signup) and create an API key from your dashboard. ### Install the LLM Gateway AI SDK Provider Install the native LLM Gateway provider for the Vercel AI SDK: ```bash pnpm add @llmgateway/ai-sdk-provider ``` This package provides full compatibility with the Vercel AI SDK and supports all LLM Gateway features. ### Update Your Code #### Basic Text Generation ```typescript // Before (Vercel AI Gateway with native providers) import { openai } from "@ai-sdk/openai"; import { anthropic } from "@ai-sdk/anthropic"; import { generateText } from "ai"; const { text: openaiText } = await generateText({ model: openai("gpt-4o"), prompt: "Hello!", }); const { text: claudeText } = await generateText({ model: anthropic("claude-3-5-sonnet-20241022"), prompt: "Hello!", }); // After (LLM Gateway - single provider for all models) import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; import { generateText } from "ai"; const llmgateway = createLLMGateway({ apiKey: process.env.LLM_GATEWAY_API_KEY, }); const { text: openaiText } = await generateText({ model: llmgateway("openai/gpt-4o"), prompt: "Hello!", }); const { text: claudeText } = await generateText({ model: llmgateway("anthropic/claude-3-5-sonnet-20241022"), prompt: "Hello!", }); ``` #### Streaming Responses ```typescript import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; import { streamText } from "ai"; const llmgateway = createLLMGateway({ apiKey: process.env.LLM_GATEWAY_API_KEY, }); const { textStream } = await streamText({ model: llmgateway("anthropic/claude-3-5-sonnet-20241022"), prompt: "Write a poem about coding", }); for await (const text of textStream) { process.stdout.write(text); } ``` #### Using in Next.js API Routes ```typescript // app/api/chat/route.ts import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; import { streamText } from "ai"; const llmgateway = createLLMGateway({ apiKey: process.env.LLM_GATEWAY_API_KEY, }); export async function POST(req: Request) { const { messages } = await req.json(); const result = await streamText({ model: llmgateway("openai/gpt-4o"), messages, }); return result.toDataStreamResponse(); } ``` #### Alternative: Using OpenAI SDK Adapter If you prefer not to install a new package, you can use `@ai-sdk/openai` with a custom base URL: ```typescript import { createOpenAI } from "@ai-sdk/openai"; import { generateText } from "ai"; const llmgateway = createOpenAI({ baseURL: "https://api.llmgateway.io/v1", apiKey: process.env.LLM_GATEWAY_API_KEY, }); const { text } = await generateText({ model: llmgateway("openai/gpt-4o"), prompt: "Hello!", }); ``` ### Update Environment Variables ```bash # Remove individual provider keys (optional - can keep as backup) # OPENAI_API_KEY=sk-... # ANTHROPIC_API_KEY=sk-ant-... # Add LLM Gateway key export LLM_GATEWAY_API_KEY=llmgtwy_your_key_here ``` ## Model Name Format LLM Gateway supports two model ID formats: **Root Model IDs** (without provider prefix) - Uses smart routing to automatically select the best provider based on uptime, throughput, price, and latency: ``` gpt-4o claude-3-5-sonnet-20241022 gemini-1.5-pro ``` **Provider-Prefixed Model IDs** - Routes to a specific provider with automatic failover if uptime drops below 90%: ``` openai/gpt-4o anthropic/claude-3-5-sonnet-20241022 google-ai-studio/gemini-1.5-pro ``` For more details on routing behavior, see the [routing documentation](/features/routing). ### Model Mapping Examples | Vercel AI SDK | LLM Gateway | | ----------------------------------------- | -------------------------------------------------------------------------------------------------- | | `openai("gpt-4o")` | `llmgateway("gpt-4o")` or `llmgateway("openai/gpt-4o")` | | `anthropic("claude-3-5-sonnet-20241022")` | `llmgateway("claude-3-5-sonnet-20241022")` or `llmgateway("anthropic/claude-3-5-sonnet-20241022")` | | `google("gemini-1.5-pro")` | `llmgateway("gemini-1.5-pro")` or `llmgateway("google-ai-studio/gemini-1.5-pro")` | Check the [models page](https://llmgateway.io/models) for the full list of available models. ## Tool Calling LLM Gateway supports tool calling through the AI SDK: ```typescript import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; import { generateText, tool } from "ai"; import { z } from "zod"; const llmgateway = createLLMGateway({ apiKey: process.env.LLM_GATEWAY_API_KEY, }); const { text, toolResults } = await generateText({ model: llmgateway("openai/gpt-4o"), tools: { weather: tool({ description: "Get the weather for a location", parameters: z.object({ location: z.string(), }), execute: async ({ location }) => { return { temperature: 72, condition: "sunny" }; }, }), }, prompt: "What's the weather in San Francisco?", }); ``` ## Self-Hosting LLM Gateway If you prefer self-hosting, LLM Gateway is available under AGPLv3: ```bash git clone https://github.com/llmgateway/llmgateway cd llmgateway pnpm install pnpm setup pnpm dev ``` This gives you the same managed experience with full control over your infrastructure. ## Need Help? * Browse available models at [llmgateway.io/models](https://llmgateway.io/models) * Read the [API documentation](https://docs.llmgateway.io) * Contact support at [contact@llmgateway.io](mailto:contact@llmgateway.io) # Rate Limits URL: /resources/rate-limits import { Callout } from "fumadocs-ui/components/callout"; # Rate Limits LLMGateway implements rate limits to ensure fair usage and optimal performance for all users. The rate limits differ based on your account status and the type of models you're using. ## Free Models Free models (models with zero input and output pricing) have rate limits that depend on your account's credit status: ### Base Rate Limits For organizations with **zero credits**: * **5 requests per 10 minutes** * Applies to all free model requests * Resets every 10 minutes ### Elevated Rate Limits For organizations that have **purchased at least some credits**: * **20 requests per minute** * Applies to all free model requests * Resets every minute When using free models with elevated limits, your credits will **not** be deducted. The elevated rate limits are simply a benefit for users who have added credits to their account. ## Paid Models **Paid AI models are not currently rate limited.** You can make as many requests as needed to paid models, subject only to your account's credit balance and any provider-specific limits. ## Rate Limit Headers All API responses include rate limit information in the headers: ```http X-RateLimit-Limit: 20 X-RateLimit-Remaining: 19 X-RateLimit-Reset: 1640995200 ``` * `X-RateLimit-Limit`: Maximum number of requests allowed in the current window * `X-RateLimit-Remaining`: Number of requests remaining in the current window * `X-RateLimit-Reset`: Unix timestamp when the rate limit window resets ## Rate Limit Exceeded When you exceed your rate limit, you'll receive a `429 Too Many Requests` response: ```json { "error": { "message": "Rate limit exceeded. Try again later.", "type": "rate_limit_error", "code": "rate_limit_exceeded" } } ``` ## Best Practices ### Upgrading Your Limits To unlock elevated rate limits for free models: 1. Add credits to your account through the dashboard 2. Your rate limits will automatically increase to 20 requests per minute 3. Free model usage will still not deduct from your credits ### Handling Rate Limits * Implement exponential backoff when you receive 429 responses * Monitor the `X-RateLimit-Remaining` header to avoid hitting limits * Consider using paid models for high-volume applications ### Cost Optimization * Use free models for development and testing * Switch to paid models for production workloads requiring higher throughput * Monitor your usage patterns through the dashboard Adding even a small amount of credits to your account (e.g., $5) will immediately upgrade your free model rate limits from 5 requests per 10 minutes to 20 requests per minute.