# LLM Gateway — Full Documentation > LLM Gateway is an open-source, OpenAI-compatible API gateway that routes, manages, and analyzes LLM requests across 20+ providers (OpenAI, Anthropic, Google, and more) through a single unified API. Switch providers without changing code, manage API keys centrally, track usage and cost, add caching and guardrails, and self-host or use the managed cloud. API base URL: https://api.llmgateway.io/v1 · Docs: https://docs.llmgateway.io · Site: https://llmgateway.io This file concatenates the full text of every documentation page below. # Introduction URL: https://docs.llmgateway.io/ LLM Gateway is an open-source API gateway that sits between your applications and LLM providers like OpenAI, Anthropic, Google AI Studio, and more. It provides a unified, OpenAI-compatible API interface with built-in cost tracking, caching, and intelligent routing. ## Features [#features] ## AI Tooling [#ai-tooling] LLM Gateway is built to work seamlessly with AI agents and development tools. ## Next Steps [#next-steps] * [**Quickstart**](/quick-start) — Get up and running in minutes * [**Overview**](/overview) — Learn more about what LLM Gateway offers * [**Self-Host**](/self-host) — Deploy on your own infrastructure # Overview URL: https://docs.llmgateway.io/overview LLM Gateway is an open-source API gateway for Large Language Models (LLMs). It acts as a middleware between your applications and various LLM providers, allowing you to: * Route requests to multiple LLM providers (OpenAI, Anthropic, Google AI Studio, and others) * Manage API keys for different providers in one place * Track token usage and costs across all your LLM interactions * Analyze performance metrics to optimize your LLM usage ## Analyzing Your LLM Requests [#analyzing-your-llm-requests] LLM Gateway provides detailed insights into your LLM usage: * **Usage Metrics**: Track the number of requests, tokens used, and response times * **Cost Analysis**: Monitor spending across different models and providers * **Performance Tracking**: Identify patterns and optimize your prompts based on actual usage data * **Breakdown by Model**: Compare different models' performance and cost-effectiveness All this data is automatically collected and presented in an intuitive dashboard, helping you make informed decisions about your LLM strategy. ## Getting Started [#getting-started] Using LLM Gateway is simple. Just swap out your current LLM provider URL with the LLM Gateway API endpoint: ```bash curl -X POST https://api.llmgateway.io/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -d '{ "model": "gpt-4o", "messages": [ {"role": "user", "content": "Hello, how are you?"} ] }' ``` LLM Gateway maintains compatibility with the OpenAI API format, making migration seamless. ## Hosted vs. Self-Hosted [#hosted-vs-self-hosted] You can use LLM Gateway in two ways: * **Hosted Version**: For immediate use without setup, visit [llmgateway.io](https://llmgateway.io) to create an account and get an API key. * **Self-Hosted**: Deploy LLM Gateway on your own infrastructure for complete control over your data and configuration. The self-hosted version offers additional customization options and ensures your LLM traffic never leaves your infrastructure if desired. # Quickstart URL: https://docs.llmgateway.io/quick-start Welcome to **LLM Gateway**—a single drop‑in endpoint that lets you call today’s best large‑language models while keeping **your existing code** and development workflow intact. > **TL;DR** — Point your HTTP requests to `https://api.llmgateway.io/v1/…`, supply your `LLM_GATEWAY_API_KEY`, and you’re done. *** ## 1 · Get an API key [#1get-an-api-key] 1. Sign in to the dashboard. 2. Create a new Project → *Copy the key*. 3. Export it in your shell (or a `.env` file): ```bash export LLM_GATEWAY_API_KEY="llmgtwy_XXXXXXXXXXXXXXXX" ``` *** ## 2 · Pick your language [#2--pick-your-language] *** ## 3 · SDK integrations [#3--sdk-integrations] ```ts title="ai-sdk.ts" import { llmgateway } from "@llmgateway/ai-sdk-provider"; import { generateText } from "ai"; const { text } = await generateText({ model: llmgateway("gpt-4o"), prompt: "Write a vegetarian lasagna recipe for 4 people.", }); ``` ```ts title="vercel-ai-sdk.ts" import { createOpenAI } from "@ai-sdk/openai"; const llmgateway = createOpenAI({ baseURL: "https://api.llmgateway.io/v1", apiKey: process.env.LLM_GATEWAY_API_KEY!, }); const completion = await llmgateway.chat({ model: "gpt-4o", messages: [{ role: "user", content: "Hello, how are you?" }], }); console.log(completion.choices[0].message.content); ``` ```ts title="openai-sdk.ts" import OpenAI from "openai"; const openai = new OpenAI({ baseURL: "https://api.llmgateway.io/v1", apiKey: process.env.LLM_GATEWAY_API_KEY, }); const completion = await openai.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello, how are you?" }], }); console.log(completion.choices[0].message.content); ``` *** ## 4 · Going further [#4going-further] * **Streaming**: pass `stream: true` to any request—Gateway will proxy the event stream unchanged. * **Monitoring**: Every call appears in the dashboard with latency, cost & provider breakdown. *** ## 5 · FAQ [#5faq] See the [Models page](https://llmgateway.io/models).

Unlike OpenRouter, we offer:

  • Full self-hosting capabilities, giving you complete control over your infrastructure
  • Enhanced analytics with deeper insights into your model usage and performance
  • No fees when using your own provider keys, maximizing cost efficiency
  • Greater flexibility and customization options for enterprise deployments
Our pricing structure is designed to be flexible and cost-effective: See the [Pricing section](https://llmgateway.io#pricing).
*** ## 6 · Next steps [#6next-steps] * Read [Self host docs](/self-host) guide. * Drop into our [GitHub](https://github.com/theopenco/llmgateway) for help or feature requests. Happy building! ✨ # Self Host LLMGateway URL: https://docs.llmgateway.io/self-host LLMGateway is a self-hostable platform that provides a unified API gateway for multiple LLM providers. This guide offers two simple options to get started. ## Prerequisites [#prerequisites] * Latest Docker * API keys for the LLM providers you want to use (OpenAI, Anthropic, etc.) ## Option 1: Unified Docker Image (Simplest) [#option-1-unified-docker-image-simplest] This option uses a single Docker container that includes all services (UI, API, Gateway, Database, Redis). ```bash # Set a strong secret first export LLM_GATEWAY_SECRET="your-secret-key-here" export GATEWAY_API_KEY_HASH_SECRET="your-api-key-hash-secret-here" # Run the container docker run -d \ --name llmgateway \ --restart unless-stopped \ -p 3002:3002 \ -p 3003:3003 \ -p 3005:3005 \ -p 3006:3006 \ -p 4001:4001 \ -p 4002:4002 \ -v llmgateway_postgres:/var/lib/postgresql/data \ -v llmgateway_redis:/var/lib/redis \ -e AUTH_SECRET="$LLM_GATEWAY_SECRET" \ -e GATEWAY_API_KEY_HASH_SECRET="$GATEWAY_API_KEY_HASH_SECRET" \ ghcr.io/theopenco/llmgateway-unified:latest ``` Docker will create the named volumes automatically on first run. Do not bind-mount a host directory directly to `/var/lib/postgresql/data`, because PostgreSQL initialization inside the container needs to manage permissions on that path. Note: it is recommended to use the latest version tag from here instead of `latest`: [https://github.com/theopenco/llmgateway/releases](https://github.com/theopenco/llmgateway/releases) ### Using Docker Compose (Alternative for unified image) [#using-docker-compose-alternative-for-unified-image] ```bash # Download the compose file curl -O https://raw.githubusercontent.com/theopenco/llmgateway/main/infra/docker-compose.unified.yml curl -O https://raw.githubusercontent.com/theopenco/llmgateway/main/.env.unified.example # Configure environment cp .env.unified.example .env # Edit .env with your configuration # Start the service docker compose -f docker-compose.unified.yml up -d ``` Note: it is recommended to replace the `latest` version tag in the image with the latest version from here: [https://github.com/theopenco/llmgateway/releases](https://github.com/theopenco/llmgateway/releases) ## Option 2: Separate Services with Docker Compose [#option-2-separate-services-with-docker-compose] This option uses separate containers for each service, offering more flexibility. ```bash # Clone the repository git clone https://github.com/theopenco/llmgateway.git cd llmgateway # Configure environment cp .env.example .env # Edit .env with your configuration # Start the services docker compose -f infra/docker-compose.split.yml up -d ``` Note: it is recommended to replace the `latest` version tag in all images in the compose file with the latest version from here: [https://github.com/theopenco/llmgateway/releases](https://github.com/theopenco/llmgateway/releases) ## Accessing Your LLMGateway [#accessing-your-llmgateway] After starting either option, you can access: * **Web Interface**: [http://localhost:3002](http://localhost:3002) * **Documentation**: [http://localhost:3005](http://localhost:3005) * **API Endpoint**: [http://localhost:4002](http://localhost:4002) * **Gateway Endpoint**: [http://localhost:4001](http://localhost:4001) ## Required Configuration [#required-configuration] At minimum, you need to set these environment variables: ```bash # Database (change the password!) POSTGRES_PASSWORD=your_secure_password_here # Authentication AUTH_SECRET=your-secret-key-here GATEWAY_API_KEY_HASH_SECRET=your-api-key-hash-secret-here # LLM Provider API Keys (add the ones you need) LLM_OPENAI_API_KEY=sk-... LLM_ANTHROPIC_API_KEY=sk-ant-... ``` ## Basic Management Commands [#basic-management-commands] ### For Unified Docker (Option 1) [#for-unified-docker-option-1] ```bash # View logs docker logs llmgateway # Restart container docker restart llmgateway # Stop container docker stop llmgateway ``` ### For Docker Compose (Option 2) [#for-docker-compose-option-2] ```bash # View logs docker compose -f infra/docker-compose.split.yml logs -f # Restart services docker compose -f infra/docker-compose.split.yml restart # Stop services docker compose -f infra/docker-compose.split.yml down ``` ## Build locally [#build-locally] To build locally, you can use the \*.local.yml compose file in the `infra` directory, which will build the images from the source code. ## All provider API keys [#all-provider-api-keys] You can set any of the following API keys: ```text LLM_OPENAI_API_KEY= LLM_ANTHROPIC_API_KEY= ``` ## Multiple API Keys and Load Balancing [#multiple-api-keys-and-load-balancing] LLMGateway supports multiple API keys per provider for load balancing and increased availability. Simply provide comma-separated values for your API keys: ```bash # Multiple OpenAI keys for load balancing LLM_OPENAI_API_KEY=sk-key1,sk-key2,sk-key3 # Multiple Anthropic keys LLM_ANTHROPIC_API_KEY=sk-ant-key1,sk-ant-key2 ``` ### Health-Aware Routing [#health-aware-routing] The gateway automatically tracks the health of each API key and routes requests to healthy keys. If a key experiences consecutive errors, it will be temporarily skipped. Keys that return authentication errors (401/403) are permanently blacklisted until restart. ### Related Configuration Values [#related-configuration-values] For providers that require additional configuration (like Google Vertex), you can specify multiple values that correspond to each API key. The gateway will always use the matching index: ```bash # Multiple Google Vertex configurations LLM_GOOGLE_VERTEX_API_KEY=key1,key2,key3 LLM_GOOGLE_CLOUD_PROJECT=project-a,project-b,project-c LLM_GOOGLE_VERTEX_REGION=us-central1,europe-west1,asia-east1 ``` When the gateway selects `key2`, it will automatically use `project-b` and `europe-west1`. If you have fewer configuration values than keys, the last value will be reused for remaining keys. ## Next Steps [#next-steps] Once your LLMGateway is running: 1. **Open the web interface** at [http://localhost:3002](http://localhost:3002) 2. **Create your first organization** and project 3. **Generate API keys** for your applications 4. **Test the gateway** by making API calls to [http://localhost:4001](http://localhost:4001) ## Helm Chart [#helm-chart] You can also deploy LLMGateway to Kubernetes using the Helm chart, which is published as an OCI artifact on GitHub Container Registry: ```bash helm install llmgateway oci://ghcr.io/theopenco/charts/llmgateway ``` This installs the latest published version. To pin to a specific release, append `--version ` (matching a published release tag without the `v` prefix, e.g. `1.2.3`). See the [Helm chart README](https://github.com/theopenco/llmgateway/tree/main/infra/helm) for configuration and the [list of available versions](https://github.com/theopenco/llmgateway/pkgs/container/charts%2Fllmgateway). # Health check URL: https://docs.llmgateway.io/health {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} # Prometheus metrics URL: https://docs.llmgateway.io/metrics {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} # Create speech URL: https://docs.llmgateway.io/v1_audio_speech {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} # Chat Completions URL: https://docs.llmgateway.io/v1_chat_completions {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} # Embeddings URL: https://docs.llmgateway.io/v1_embeddings {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} # Edit image URL: https://docs.llmgateway.io/v1_images_edits {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} # Create image URL: https://docs.llmgateway.io/v1_images_generations {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} # Anthropic Messages URL: https://docs.llmgateway.io/v1_messages {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} # Models URL: https://docs.llmgateway.io/v1_models {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} # Moderations URL: https://docs.llmgateway.io/v1_moderations {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} # Video content URL: https://docs.llmgateway.io/v1_videos_content {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} # Create video URL: https://docs.llmgateway.io/v1_videos_create {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} # Video log content URL: https://docs.llmgateway.io/v1_videos_log_content {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} # Retrieve video URL: https://docs.llmgateway.io/v1_videos_retrieve {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} # Anthropic API Compatibility URL: https://docs.llmgateway.io/features/anthropic-endpoint # Anthropic API Compatibility [#anthropic-api-compatibility] LLMGateway provides a native Anthropic-compatible endpoint at `/v1/messages` that allows you to use any model in our catalog while maintaining the familiar Anthropic API format This is especially useful for applications designed for Claude that you want to extend to use other models. Enjoy a 50% discount on our Anthropic models for a limited time. ## Overview [#overview] The Anthropic endpoint transforms requests from Anthropic's message format to the OpenAI-compatible format used by LLMGateway, then transforms the responses back to Anthropic's format. This means you can: * Use **any model** available in LLMGateway with Anthropic's API format * Maintain existing code that uses Anthropic's SDK or API format * Access models from OpenAI, Google, Cohere, and other providers through the Anthropic interface * Leverage LLMGateway's routing, caching, and cost optimization features ## Basic Usage [#basic-usage] ## Configuration for Claude Code [#configuration-for-claude-code] This endpoint is perfect for configuring Claude Code to use any model available in LLMGateway: ```bash export ANTHROPIC_BASE_URL=https://api.llmgateway.io export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here # optional: specify a model, otherwise it uses the default Claude model export ANTHROPIC_MODEL=gpt-5 # or any model from our catalog # now run claude! claude ``` ### Choosing Models [#choosing-models] You can use any model from the [models page](https://llmgateway.io/models). Popular options for Claude Code include: ```bash # Use OpenAI's latest model export ANTHROPIC_MODEL=gpt-5 # Use a cost-effective alternative export ANTHROPIC_MODEL=gpt-5-mini # Use Google's Gemini export ANTHROPIC_MODEL=gemini-2.5-pro # Use Anthropic's actual Claude models export ANTHROPIC_MODEL=claude-3-5-sonnet-20241022 ``` ## Environment Variables [#environment-variables] When configuring Claude Code or other Anthropic-compatible applications, you can use these environment variables: ### ANTHROPIC\_MODEL [#anthropic_model] Specifies the main model to use for primary requests. * **Default**: `claude-sonnet-4-20250514` * **Example**: `export ANTHROPIC_MODEL=gpt-5` ### ANTHROPIC\_SMALL\_FAST\_MODEL [#anthropic_small_fast_model] Specifies a smaller, faster model used for background functionality and internal operations. * **Default**: `claude-3-5-haiku-20241022` * **Example**: `export ANTHROPIC_SMALL_FAST_MODEL=gpt-5-nano` ```bash # Example configuration export ANTHROPIC_BASE_URL=https://api.llmgateway.io export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here export ANTHROPIC_MODEL=gpt-5 export ANTHROPIC_SMALL_FAST_MODEL=gpt-5-nano ``` ## Advanced Features [#advanced-features] ### Making a manual request [#making-a-manual-request] ```bash curl -X POST "https://api.llmgateway.io/v1/messages" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-5", "messages": [ {"role": "user", "content": "Hello, how are you?"} ], "max_tokens": 100 }' ``` ### Response Format [#response-format] The endpoint returns responses in Anthropic's message format: ```json { "id": "msg_abc123", "type": "message", "role": "assistant", "model": "gpt-5", "content": [ { "type": "text", "text": "Hello! I'm doing well, thank you for asking. How can I help you today?" } ], "stop_reason": "end_turn", "stop_sequence": null, "usage": { "input_tokens": 13, "output_tokens": 20 } } ``` # API Keys & IAM Rules URL: https://docs.llmgateway.io/features/api-keys # API Keys & IAM Rules [#api-keys--iam-rules] API keys are the primary method for authenticating with the LLM Gateway. This guide covers creating API keys, managing them, and configuring IAM rules for fine-grained access control. ## Overview [#overview] LLM Gateway provides comprehensive API key management with the following features: * **Basic API Key Management**: Create, list, update, and delete API keys * **Usage Limits**: Set lifetime and recurring spending limits on individual API keys * **Expiration (TTL)**: Give a key a time-to-live so it disables itself automatically * **IAM Rules**: Fine-grained access control for models, providers, and pricing * **Usage Tracking**: Monitor API key usage and costs * **Status Management**: Enable/disable keys without deletion ## Creating API Keys [#creating-api-keys] ### Via Dashboard [#via-dashboard] At this time, API keys can only be created via the dashboard. 1. Navigate to your project in the LLM Gateway dashboard 2. Go to the **API Keys** section 3. Click **Create API Key** 4. Provide a description for your key 5. Optionally set an all-time usage limit 6. Optionally set a recurring usage limit such as `$10 / day` or `$500 / month` 7. Optionally set an expiration (TTL) such as `30 minutes`, `12 hours`, or `7 days` 8. Click **Create** API keys are shown in full only once during creation. Make sure to copy and store them securely. ## Using API Keys [#using-api-keys] Once you have an API key, use it in the `Authorization` header of your requests: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer llmgtwy_your_api_key_here" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }' ``` ## Disabling/Enabling API Keys [#disablingenabling-api-keys] You can disable an API key to stop it from being used, but the key is not deleted and can be re-enabled later. ## Expiration (TTL) [#expiration-ttl] You can give an API key a **time-to-live (TTL)** when you create it. Set how long the key should live — in **minutes**, **hours**, or **days** — and it will be disabled automatically once that time passes. This is ideal for short-lived integrations, demos, CI jobs, and temporary access. * A key works normally until its expiration time * Once expired, the gateway rejects requests with that key with a `401 Unauthorized` * A background job marks expired keys as **inactive**, so the dashboard reflects the disabled state * Keys created without a TTL never expire (the default) ### Reactivating an Expired Key [#reactivating-an-expired-key] An expired key is paused, not deleted. To bring it back online you must reactivate it **with a new future expiration** — an expired key cannot be re-enabled while its TTL is still in the past. Keys that have no TTL, or whose TTL is still in the future, can be enabled and disabled freely without setting a new expiration. Expiration is independent of usage limits. A key can hit its TTL before, or instead of, reaching a spend cap. ## Usage Limits [#usage-limits] Usage is tracked per API key on the API Keys page. Usage includes both costs from LLM Gateway credits and usage from your own provider keys when applicable, giving you complete visibility into total spending per key. You can set two independent limits for each key: * **All-time usage limit**: A lifetime spend cap * **Recurring usage limit**: A spend cap that resets every configured hour, day, week, or month When a key reaches either limit, requests using that key return `401 Unauthorized` until the key is updated or, for recurring limits, the next usage window starts. This is separate from IAM rule violations, which return `403 Forbidden`. Recurring windows support: * Minimum duration: **1 hour** * Maximum duration: **12 months** * Units: **hour**, **day**, **week**, **month** For the dashboard walkthrough and field-by-field details, see [API Keys in Learn](/learn/api-keys). ## IAM Rules [#iam-rules] IAM (Identity Access Management) rules provide fine-grained access control over what models, providers, and pricing tiers an API key can access. ### Rule Types [#rule-types] #### Model Access Rules [#model-access-rules] Control access to specific models: * **Allow Models**: Only allow access to specific models * **Deny Models**: Block access to specific models #### Provider Access Rules [#provider-access-rules] Control access to specific providers: * **Allow Providers**: Only allow access to specific providers * **Deny Providers**: Block access to specific providers #### Pricing Rules [#pricing-rules] Control access based on model pricing: * **Allow Pricing**: Set constraints on what pricing tiers are allowed * **Deny Pricing**: Block specific pricing tiers * **Free vs Paid**: Allow or deny access to free vs paid models #### IP Address Rules [#ip-address-rules] IP address rules are available on the **Enterprise** plan only. Contact us at [contact@llmgateway.io](mailto:contact@llmgateway.io) to enable them for your organization. Restrict where the API key can be used from by source IP, using CIDR ranges: * **Allow IP Ranges (CIDR)**: Only permit requests from the listed IPv4/IPv6 CIDRs * **Deny IP Ranges (CIDR)**: Block requests from the listed IPv4/IPv6 CIDRs Both IPv4 (e.g. `192.0.2.0/24`) and IPv6 (e.g. `2001:db8::/32`) ranges are supported, and you can mix both in a single rule. To restrict to a single address, use a `/32` (IPv4) or `/128` (IPv6) prefix. The gateway reads the client IP from the first entry in the `X-Forwarded-For` header (set by the GCP load balancer). When an `allow_ip_cidrs` rule is configured and the gateway cannot determine the client IP, the request is denied. Invalid CIDR syntax is rejected at rule-creation time with a `400` error. ## Error Handling [#error-handling] When API keys encounter IAM rule violations, the API returns a `403` with the standard OpenAI error envelope: ```json { "error": { "message": "Access denied: Model gpt-4 is not in the allowed models list", "type": "invalid_request_error", "param": null, "code": "permission_denied" } } ``` Common error scenarios: * Model not allowed by IAM rules * Provider blocked by IAM rules * Pricing limits exceeded * API key disabled or deleted * API key expired (TTL passed) * Usage limit reached ## Migration from Legacy Keys [#migration-from-legacy-keys] If you have existing API keys without IAM rules: 1. **Backward Compatibility**: Existing keys continue to work without restrictions 2. **Gradual Migration**: Add IAM rules incrementally 3. **Testing**: Test IAM rules in development before applying to production 4. **Monitoring**: Monitor for access denied errors after implementing rules API keys without IAM rules have unrestricted access to all models and providers. # Audit Logs URL: https://docs.llmgateway.io/features/audit-logs # Audit Logs [#audit-logs] Audit logs provide complete visibility into all actions within your organization. Track who did what, when, and to which resource. Audit logs are available on the [**Enterprise plan**](https://llmgateway.io/enterprise) for organization owners and admins. ## What's Tracked [#whats-tracked] Every significant action is logged with detailed metadata: | Field | Description | | ----------------- | -------------------------------------------------------- | | **Timestamp** | When the action occurred | | **User** | Who performed the action (name and email) | | **Action** | What was done (e.g., `api_key.create`, `project.update`) | | **Resource Type** | Category of the affected resource | | **Resource ID** | Unique identifier of the affected resource | | **Details** | Additional context like resource names or changed fields | ## Tracked Actions [#tracked-actions] ### Organization Management [#organization-management] * `organization.update` — Organization settings changed * `organization.delete` — Organization deleted ### Project Management [#project-management] * `project.create` — New project created * `project.update` — Project settings changed * `project.delete` — Project deleted ### Team Management [#team-management] * `team_member.add` — New member invited * `team_member.update` — Member role changed * `team_member.remove` — Member removed ### API Key Management [#api-key-management] * `api_key.create` — New API key created * `api_key.update_status` — API key enabled/disabled * `api_key.update_limit` — Usage limit changed * `api_key.delete` — API key deleted * `api_key.iam_rule.create` — IAM rule added * `api_key.iam_rule.update` — IAM rule modified * `api_key.iam_rule.delete` — IAM rule removed ### Provider Key Management [#provider-key-management] * `provider_key.create` — Provider key added * `provider_key.update` — Provider key status changed * `provider_key.delete` — Provider key removed ### Billing Events [#billing-events] * `subscription.create` — Subscription started * `subscription.cancel` — Subscription cancelled * `subscription.resume` — Subscription resumed * `payment.credit_topup` — Credits purchased ## Filtering and Search [#filtering-and-search] Filter logs by: * **Action** — Specific action type * **Resource Type** — Category of resource * **User** — Who performed the action * **Date Range** — Time period ## Data Retention [#data-retention] Audit logs are retained for **90 days** on the Enterprise plan. ## Access Control [#access-control] Only organization **owners** and **admins** can view audit logs. This ensures sensitive activity data is only visible to authorized personnel. ## Get Started [#get-started] Audit logs are an Enterprise feature. [Contact us](https://llmgateway.io/enterprise) to enable Enterprise for your organization. # Cost Breakdown URL: https://docs.llmgateway.io/features/cost-breakdown # Cost Breakdown [#cost-breakdown] LLM Gateway provides real-time cost information for each API request directly in the response's `usage` object. This allows you to track costs programmatically without needing to query the dashboard. Cost breakdown is available for all users on both hosted and self-hosted deployments. ## Response Format [#response-format] When cost breakdown is enabled, your API responses will include additional cost fields in the `usage` object: ```json { "id": "chatcmpl-123", "object": "chat.completion", "created": 1234567890, "model": "openai/gpt-4o", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 10, "completion_tokens": 15, "total_tokens": 25, "cost": 0.000125, "cost_details": { "upstream_inference_cost": 0.000125, "upstream_inference_prompt_cost": 0.000025, "upstream_inference_completions_cost": 0.0001, "total_cost": 0.000125, "input_cost": 0.000025, "output_cost": 0.0001, "cached_input_cost": 0, "request_cost": 0, "web_search_cost": 0, "image_input_cost": null, "image_output_cost": null, "data_storage_cost": 0.00000025 }, "prompt_tokens_details": { "cached_tokens": 0, "cache_write_tokens": 0, "audio_tokens": 0, "video_tokens": 0 }, "completion_tokens_details": { "reasoning_tokens": 0, "image_tokens": 0, "audio_tokens": 0 } } } ``` ## Cost Fields [#cost-fields] | Field | Description | | -------------------------------------------------- | ------------------------------------------------------------------------ | | `cost` | Total inference cost for the request in USD | | `cost_details.upstream_inference_cost` | Combined upstream inference cost in USD (prompt + completions) | | `cost_details.upstream_inference_prompt_cost` | Upstream cost for prompt tokens in USD (includes cached prompt discount) | | `cost_details.upstream_inference_completions_cost` | Upstream cost for completion tokens in USD | | `cost_details.total_cost` | Total request cost in USD (LLM Gateway extended field) | | `cost_details.input_cost` | Cost for non-cached prompt tokens in USD | | `cost_details.output_cost` | Cost for completion tokens in USD | | `cost_details.cached_input_cost` | Cost for cached prompt tokens in USD | | `cost_details.request_cost` | Per-request flat fee in USD (when the model applies one) | | `cost_details.web_search_cost` | Cost for web search tool calls in USD | | `cost_details.image_input_cost` | Cost for image inputs in USD | | `cost_details.image_output_cost` | Cost for image outputs in USD | | `cost_details.data_storage_cost` | Storage cost for retained request/response payloads in USD | ## Token Detail Fields [#token-detail-fields] The `usage` object also includes detailed token counters that mirror OpenAI's extended format: | Field | Description | | -------------------------------------------- | ---------------------------------------------------------------- | | `prompt_tokens_details.cached_tokens` | Number of prompt tokens served from the provider's prompt cache | | `prompt_tokens_details.cache_write_tokens` | Number of prompt tokens written into the provider's prompt cache | | `prompt_tokens_details.audio_tokens` | Number of audio prompt tokens | | `prompt_tokens_details.video_tokens` | Number of video prompt tokens | | `completion_tokens_details.reasoning_tokens` | Number of reasoning tokens produced by reasoning models | | `completion_tokens_details.image_tokens` | Number of image tokens produced | | `completion_tokens_details.audio_tokens` | Number of audio tokens produced | ## Streaming Responses [#streaming-responses] Cost information is also available in streaming responses. The cost fields are included in the final usage chunk sent before the `[DONE]` message: ``` data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[...],"usage":{"prompt_tokens":10,"completion_tokens":15,"total_tokens":25,"cost":0.000125,"cost_details":{"upstream_inference_cost":0.000125,"upstream_inference_prompt_cost":0.000025,"upstream_inference_completions_cost":0.0001,"total_cost":0.000125,"input_cost":0.000025,"output_cost":0.0001,"cached_input_cost":0,"request_cost":0,"web_search_cost":0,"image_input_cost":null,"image_output_cost":null,"data_storage_cost":0.00000025}}} data: [DONE] ``` ## Example: Tracking Costs in Code [#example-tracking-costs-in-code] Here's an example of how to track costs programmatically using the cost breakdown feature: ```typescript import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.LLM_GATEWAY_API_KEY, baseURL: "https://api.llmgateway.io/v1", }); async function trackCosts() { const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], }); const usage = response.usage as any; if (usage.cost !== undefined) { console.log(`Request cost: $${usage.cost.toFixed(6)}`); console.log( ` Prompt: $${usage.cost_details.upstream_inference_prompt_cost.toFixed(6)}`, ); console.log( ` Completions: $${usage.cost_details.upstream_inference_completions_cost.toFixed(6)}`, ); const cachedTokens = usage.prompt_tokens_details?.cached_tokens ?? 0; if (cachedTokens > 0) { console.log(` Cached prompt tokens: ${cachedTokens}`); } } return response; } ``` ## Use Cases [#use-cases] ### Budget Monitoring [#budget-monitoring] Track costs in real-time and implement budget limits in your application: ```typescript let totalSpent = 0; const BUDGET_LIMIT = 10.0; // $10 budget async function makeRequest(messages: Message[]) { const response = await client.chat.completions.create({ model: "gpt-4o", messages, }); const cost = (response.usage as any).cost || 0; totalSpent += cost; if (totalSpent > BUDGET_LIMIT) { throw new Error(`Budget exceeded: $${totalSpent.toFixed(2)}`); } return response; } ``` ### Per-User Cost Allocation [#per-user-cost-allocation] Track costs per user for billing or analytics: ```typescript const userCosts: Map = new Map(); async function makeRequestForUser(userId: string, messages: Message[]) { const response = await client.chat.completions.create({ model: "gpt-4o", messages, }); const cost = (response.usage as any).cost || 0; const currentCost = userCosts.get(userId) || 0; userCosts.set(userId, currentCost + cost); return response; } ``` ### Cost Analytics [#cost-analytics] Aggregate costs by model, time period, or any other dimension: ```typescript interface CostEntry { timestamp: Date; model: string; promptCost: number; completionsCost: number; totalCost: number; } const costLog: CostEntry[] = []; async function loggedRequest(model: string, messages: Message[]) { const response = await client.chat.completions.create({ model, messages, }); const usage = response.usage as any; costLog.push({ timestamp: new Date(), model: response.model, promptCost: usage.cost_details?.upstream_inference_prompt_cost || 0, completionsCost: usage.cost_details?.upstream_inference_completions_cost || 0, totalCost: usage.cost || 0, }); return response; } ``` ## Self-Hosted Deployments [#self-hosted-deployments] If you're running a self-hosted LLM Gateway deployment, cost breakdown is always included in API responses regardless of plan. This allows you to track internal costs and allocate them across teams or projects. # Custom Providers URL: https://docs.llmgateway.io/features/custom-providers # Custom Providers [#custom-providers] LLMGateway supports integrating custom OpenAI-compatible providers, allowing you to use any API that follows the OpenAI chat completions format. This feature is perfect for: * Private or self-hosted LLM deployments * Specialized AI providers not natively supported * Internal AI services within your organization * Testing against different model endpoints Custom providers must be OpenAI-compatible, supporting the `/v1/chat/completions` endpoint format. ## Quick Setup [#quick-setup] ### 1. Add a Custom Provider Key [#1-add-a-custom-provider-key] Navigate to your organization's provider settings and add a custom provider via the UI. Provide a lowercase name, OpenAI-compatible base URL, and API token for the custom provider. ### 2. Make Requests [#2-make-requests] Once configured, make requests using the format `{customName}/{modelName}`: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "mycompany/custom-gpt-4", "messages": [ { "role": "user", "content": "Hello from my custom provider!" } ] }' ``` ## Configuration Requirements [#configuration-requirements] ### Custom Provider Name [#custom-provider-name] * **Format**: Lowercase letters only (`a-z`) * **Examples**: `mycompany`, `internal`, `testing` * **Invalid**: `MyCompany`, `my-company`, `my_company`, `123test` The custom provider name must match the regex pattern `/^[a-z]+$/` exactly. ### Base URL [#base-url] * Must be a valid HTTPS URL * Should point to your provider's base endpoint * LLMGateway will append `/v1/chat/completions` automatically * **Example**: `https://api.example.com` → `https://api.example.com/v1/chat/completions` ### API Token [#api-token] * Provider-specific authentication token * Used in the `Authorization: Bearer {token}` header Unlike built-in providers, custom provider models are not validated, giving you complete flexibility. ## Supported Features [#supported-features] Custom providers inherit full LLMGateway functionality. # Data Retention URL: https://docs.llmgateway.io/features/data-retention # Data Retention [#data-retention] LLM Gateway offers configurable data retention policies that allow you to store full request and response payloads. This enables powerful debugging capabilities, detailed analytics, and compliance with data governance requirements. ## Retention Levels [#retention-levels] LLM Gateway supports two retention levels that can be configured per organization: | Level | Description | Storage Cost | | ------------------- | ---------------------------------------------------------------------------------------------- | --------------- | | **Metadata Only** | Stores request metadata (timestamps, model, tokens, costs) without full payloads. Default. | Free | | **Retain All Data** | Stores complete request and response payloads including messages, tool calls, and attachments. | $0.01/1M tokens | Metadata-only retention is enabled by default and provides usage analytics without additional storage costs. ## Storage Pricing [#storage-pricing] When full data retention is enabled, storage is billed at **$0.01 per 1 million tokens**. This rate applies to: * Input tokens (prompt) * Cached input tokens * Output tokens (completion) * Reasoning tokens Storage costs are calculated per request and billed separately from inference. When "Retain All Data" is enabled, each response's `usage.cost_details` object includes a `data_storage_cost` field with the per-request storage cost in USD. See [Cost Breakdown](/features/cost-breakdown) for the full list of cost fields. ### Example Cost Calculation [#example-cost-calculation] For a request with: * 1,000 input tokens * 500 output tokens * 1,500 total tokens Storage cost = 1,500 / 1,000,000 × $0.01 = **$0.000015** ## Configuring Retention [#configuring-retention] Data retention is configured at the organization level in your dashboard settings: 1. Navigate to **Organization Settings** → **Policies** 2. Select your preferred **Data Retention Level** 3. Save changes Changing retention settings applies to new requests only. Existing stored data follows the retention period active when it was created. ## Retention Periods [#retention-periods] Data is retained for 30 days for all users. Enterprise plans can have custom retention periods. After the retention period expires, data is automatically deleted. ## Accessing Stored Data [#accessing-stored-data] When data retention is enabled, you can access your stored requests through the dashboard: * View request history with full payload inspection * Filter by model and date range * Inspect complete request and response payloads ## Use Cases [#use-cases] ### Debugging [#debugging] Full data retention enables you to: * Inspect exact prompts sent to models * Review complete responses including tool calls * Trace conversation histories * Identify issues in production ### Analytics [#analytics] With stored payloads, you can: * Analyze prompt patterns and effectiveness * Track response quality over time * Build custom dashboards and reports * Measure model performance across use cases ### Compliance [#compliance] Data retention helps meet compliance requirements by: * Maintaining audit trails of AI interactions * Enabling data governance policies * Supporting incident investigation * Providing records for regulatory requirements ## Billing Considerations [#billing-considerations] ### Credit Usage [#credit-usage] In **API keys mode** (using your own provider keys): * Only storage costs are deducted from LLM Gateway credits * Inference costs are billed directly to your provider In **credits mode**: * Both inference and storage costs are deducted from credits ### Monitoring Storage Costs [#monitoring-storage-costs] Storage costs appear in: * Usage dashboard under "Storage" category * Billing invoices as a separate line item Enable [auto top-up](/dashboard) in billing settings to ensure uninterrupted service when storage costs accumulate. ## Self-Hosted Deployments [#self-hosted-deployments] Self-hosted deployments have full control over data retention: * Configure retention periods in environment variables * Data is stored in your own PostgreSQL database * No additional storage costs (you manage your own infrastructure) ## Privacy and Security [#privacy-and-security] * All stored data is encrypted at rest * Access is restricted to organization members with appropriate permissions * Data is automatically deleted after the retention period * You can request immediate deletion of specific records through support # Document Reading URL: https://docs.llmgateway.io/features/documents # Document Reading [#document-reading] LLMGateway supports sending documents (PDFs and other file types) to document-capable models using OpenAI's `file` content block format. The gateway forwards the document to the underlying provider so the model can read and reason over its contents. ## Document-Capable Models [#document-capable-models] Document input is currently supported on Google Gemini models via Google AI Studio. You can find document-capable models on the [models page with the document filter](https://llmgateway.io/models?filters=1\&document=true). ## Sending a Document [#sending-a-document] Add a `file` content block to a user message. The `file_data` field must be a base64-encoded data URL that includes the document's MIME type. ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-2.5-flash", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Summarize this document." }, { "type": "file", "file": { "filename": "report.pdf", "file_data": "data:application/pdf;base64,JVBERi0xLjQKJ..." } } ] } ] }' ``` ### Content Block Fields [#content-block-fields] * **`type`**: must be `"file"`. * **`file.filename`** *(optional)*: original filename, shown in the playground and forwarded for context. * **`file.file_data`**: base64-encoded data URL of the form `data:;base64,`. The `file.file_id` field (for referencing files uploaded via a provider's Files API) is accepted by the schema but not currently supported by the Google transform. Use `file_data` with an inline base64 data URL. ## Supported File Types [#supported-file-types] The accepted MIME types depend on the target model. Gemini models commonly support: * `application/pdf` * `text/plain` * `text/html` * `text/css` * `text/javascript` * `text/csv` * `text/markdown` * `text/xml` If the upstream provider rejects the MIME type, the gateway surfaces a `400` error including the unsupported MIME type and the provider it was sent to. To use a different file type, encode the file with the matching MIME type in the data URL prefix. ## Encoding a File as a Data URL [#encoding-a-file-as-a-data-url] Any tool that can produce base64 output works. For example, in a shell: ```bash DATA=$(base64 -i report.pdf | tr -d '\n') echo "data:application/pdf;base64,$DATA" ``` Or in JavaScript: ```javascript import { readFileSync } from "node:fs"; const buffer = readFileSync("report.pdf"); const fileData = `data:application/pdf;base64,${buffer.toString("base64")}`; ``` Then pass `fileData` as the `file.file_data` value in your request. ## Multiple Documents [#multiple-documents] You can include multiple `file` blocks in a single message, optionally mixed with text and image content: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-2.5-pro", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Compare these two reports." }, { "type": "file", "file": { "filename": "q1.pdf", "file_data": "data:application/pdf;base64,JVBERi0x..." } }, { "type": "file", "file": { "filename": "q2.pdf", "file_data": "data:application/pdf;base64,JVBERi0x..." } } ] } ] }' ``` ## Error Handling [#error-handling] The gateway returns `400` for the following document-related errors: * The selected model does not support document input. * The `file` block is missing both `file_data` and `file_id`. * `file_data` is not a valid base64 data URL. * The upstream provider rejects the document's MIME type for the selected model. # Embeddings URL: https://docs.llmgateway.io/features/embeddings # Embeddings [#embeddings] LLMGateway exposes an OpenAI-compatible `/v1/embeddings` endpoint for generating vector representations of text — useful for semantic search, clustering, recommendations, and RAG. Browse available embedding models on the [models page](https://llmgateway.io/models?filters=1\&embedding=true). ## Supported providers [#supported-providers] * **OpenAI** — `text-embedding-3-small`, `text-embedding-3-large`, `text-embedding-ada-002` * **Google AI Studio** — `gemini-embedding-2` (recommended), `gemini-embedding-001` (legacy) * **Google Vertex AI** — `gemini-embedding-001`, `text-embedding-005` The gateway translates between provider-native request/response shapes (e.g. Google's `:embedContent` / `:batchEmbedContents`) and the OpenAI-compatible payload, so you can swap models without changing your client code. ## cURL [#curl] ```bash curl -X POST "https://api.llmgateway.io/v1/embeddings" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "text-embedding-3-small", "input": "The quick brown fox jumps over the lazy dog." }' ``` ## OpenAI JS SDK [#openai-js-sdk] ```ts import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.LLM_GATEWAY_API_KEY, baseURL: "https://api.llmgateway.io/v1", }); const response = await client.embeddings.create({ model: "text-embedding-3-small", input: "The quick brown fox jumps over the lazy dog.", }); console.log(response.data[0].embedding); ``` Embedding models are billed only for input tokens. There are no output tokens since embeddings are fixed-size vectors. # Guardrails URL: https://docs.llmgateway.io/features/guardrails # Guardrails [#guardrails] Guardrails protect your organization by automatically detecting and blocking harmful content in LLM requests before they reach the model. Guardrails are available on the [**Enterprise plan**](https://llmgateway.io/enterprise). ## Overview [#overview] Guardrails run on every API request, scanning message content for: * Security threats (prompt injection, jailbreak attempts) * Sensitive data (PII, secrets, credentials) * Policy violations (blocked terms, restricted topics) When a violation is detected, you control what happens: block the request, redact the content, or log a warning. ## System Rules [#system-rules] Built-in rules protect against common threats: ### Prompt Injection Detection [#prompt-injection-detection] Detects attempts to override or manipulate system instructions. Common patterns include: * "Ignore all previous instructions" * "You are now a different AI" * Hidden instructions in encoded text ### Jailbreak Detection [#jailbreak-detection] Identifies attempts to bypass safety measures: * DAN (Do Anything Now) prompts * Roleplay-based bypasses * Instruction override attempts ### PII Detection [#pii-detection] Identifies personal information: * Email addresses * Phone numbers * Social Security Numbers * Credit card numbers * IP addresses When the action is set to **redact**, PII is replaced with placeholders like `[EMAIL_REDACTED]`. ### Secrets Detection [#secrets-detection] Detects credentials and API keys: * AWS access keys and secrets * Generic API keys * Passwords in common formats * Private keys ### File Type Restrictions [#file-type-restrictions] Control which file types can be uploaded: * Configure allowed MIME types * Set maximum file size limits * Block potentially dangerous file types ### Document Leakage Prevention [#document-leakage-prevention] Detects attempts to extract confidential documents or internal data. ## Configurable Actions [#configurable-actions] For each rule, choose how to respond: | Action | Behavior | | ---------- | --------------------------------------------------- | | **Block** | Reject the request with a content policy error | | **Redact** | Remove or mask the sensitive content, then continue | | **Warn** | Log the violation but allow the request to proceed | ## Custom Rules [#custom-rules] Create organization-specific rules for your use case: ### Blocked Terms [#blocked-terms] Prevent specific words or phrases from being used: * Match type: exact, contains, or regex * Case-sensitive matching option * Multiple terms per rule ### Custom Regex [#custom-regex] Match patterns unique to your organization: * Internal project codenames * Customer identifiers * Domain-specific sensitive data ### Topic Restrictions [#topic-restrictions] Block content related to specific topics: * Define restricted topics * Keyword-based detection ## Security Events Dashboard [#security-events-dashboard] Monitor all guardrail violations with a dedicated dashboard: * **Total violations** — Overall count and trends * **By action** — Breakdown of blocked, redacted, and warned * **By category** — Which rules are being triggered * **Detailed logs** — Individual violations with timestamps and matched patterns ## How It Works [#how-it-works] ``` Request → Guardrails Check → Action Based on Rules → Forward to Model (if allowed) ↓ Log Violation ``` 1. **Request received** — API request comes in with messages 2. **Content scanned** — All text content is checked against enabled rules 3. **Violations detected** — Matches are identified and logged 4. **Action taken** — Based on rule configuration (block/redact/warn) 5. **Request proceeds** — If not blocked, the (potentially redacted) request continues ## Best Practices [#best-practices] 1. **Start with warnings** — Enable rules in warn mode first to understand your traffic patterns 2. **Review violations** — Check the Security Events dashboard regularly 3. **Tune custom rules** — Adjust blocked terms and regex patterns based on false positives 4. **Layer defenses** — Use multiple rule types together for comprehensive protection ## Get Started [#get-started] Guardrails are an Enterprise feature. [Contact us](https://llmgateway.io/enterprise) to enable Enterprise for your organization. # Image Generation URL: https://docs.llmgateway.io/features/image-generation # Image Generation [#image-generation] LLMGateway supports image generation through two APIs: 1. **`/v1/images/generations`** — OpenAI-compatible images endpoint (recommended for simple image generation) 2. **`/v1/images/edits`** — OpenAI-compatible image editing endpoint 3. **`/v1/chat/completions`** — Chat completions with image generation models (for conversational image generation and editing) For asynchronous video generation, see [Video Generation](/features/video-generation). ## Available Models [#available-models] You can find all available image generation models on our [models page](https://llmgateway.io/models?filters=1\&imageGeneration=true). ## OpenAI Images API [#openai-images-api] The `/v1/images/generations` endpoint provides a drop-in replacement for OpenAI's image generation API. It works with any OpenAI-compatible client library. ### Parameters [#parameters] | Parameter | Type | Default | Description | | ----------------- | ------- | ------------ | ---------------------------------------------------------------------------------------------------------------- | | `prompt` | string | required | A text description of the desired image(s) | | `model` | string | `"auto"` | The model to use. `auto` resolves to `gemini-3-pro-image-preview` | | `n` | integer | `1` | Number of images to generate (1-10) | | `size` | string | — | Image dimensions. Supported sizes depend on the model/provider — see [Image Configuration](#image-configuration) | | `quality` | string | — | Image quality. Supported values depend on the model/provider — see [Image Configuration](#image-configuration) | | `response_format` | string | `"b64_json"` | Only `b64_json` is supported | | `style` | string | — | Image style: `vivid` or `natural` | ### curl [#curl] ```bash curl -X POST "https://api.llmgateway.io/v1/images/generations" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-3-pro-image-preview", "prompt": "A cute cat wearing a tiny top hat", "n": 1, "size": "1024x1024" }' ``` ### OpenAI SDK [#openai-sdk] Works with the standard OpenAI client library — just point the base URL to LLMGateway. ```ts import OpenAI from "openai"; import { writeFileSync } from "fs"; const client = new OpenAI({ baseURL: "https://api.llmgateway.io/v1", apiKey: process.env.LLM_GATEWAY_API_KEY, }); const response = await client.images.generate({ model: "gemini-3-pro-image-preview", prompt: "A futuristic city skyline at sunset with flying cars", n: 1, size: "1024x1024", }); response.data.forEach((image, i) => { if (image.b64_json) { const buf = Buffer.from(image.b64_json, "base64"); writeFileSync(`image-${i}.png`, buf); } }); ``` ### Vercel AI SDK [#vercel-ai-sdk] Use the `@llmgateway/ai-sdk-provider` with `generateImage`. ```ts import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; import { generateImage } from "ai"; import { writeFileSync } from "fs"; const llmgateway = createLLMGateway({ apiKey: process.env.LLM_GATEWAY_API_KEY, }); const result = await generateImage({ model: llmgateway.image("gemini-3-pro-image-preview"), prompt: "A cozy cabin in a snowy mountain landscape at night with aurora borealis", size: "1024x1024", n: 1, // aspectRatio and quality are model-specific — only some providers honor them. // aspectRatio works on Gemini image models; OpenAI gpt-image-2 ignores it // (use a literal WxH `size` instead). aspectRatio: "16:9", // quality works on OpenAI gpt-image-2 ("low" | "medium" | "high" | "auto"). // The AI SDK only forwards it through providerOptions. providerOptions: { llmgateway: { quality: "high" }, }, }); result.images.forEach((image, i) => { const buf = Buffer.from(image.base64, "base64"); writeFileSync(`image-${i}.png`, buf); }); ``` ## OpenAI Images Edit API [#openai-images-edit-api] The `/v1/images/edits` endpoint is OpenAI-compatible and supports a focused subset of `images.edit` parameters. ### Parameters [#parameters-1] | Parameter | Type | Required | Description | | -------------------- | ------------------------ | -------- | ------------------------------------------------------------------ | | `images` | array of `{ image_url }` | yes | Input images. `image_url` supports HTTPS URLs and base64 data URLs | | `prompt` | string | yes | A text description of the desired image edit | | `model` | string | no | Image editing model | | `background` | enum | no | `transparent`, `opaque`, or `auto` | | `input_fidelity` | enum | no | `high` or `low` | | `n` | integer | no | Number of edited images to generate | | `output_format` | enum | no | `png`, `jpeg`, or `webp` | | `output_compression` | integer | no | Compression level for `jpeg`/`webp` | | `quality` | enum | no | `low`, `medium`, `high`, or `auto` | | `size` | string | no | Output size. Examples: `1024x1024`, `1536x1024`, `1K`, `2K`, `4K` | | `aspect_ratio` | string | no | Aspect ratio override. Examples: `1:1`, `16:9`, `4:3`, `5:4` | `mask` is not supported yet on `/v1/images/edits`. ### curl (HTTPS image URL) [#curl-https-image-url] ```bash curl -X POST "https://api.llmgateway.io/v1/images/edits" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "images": [ { "image_url": "https://example.com/source-image.png" } ], "prompt": "Add a watercolor effect to this image", "model": "gemini-3-pro-image-preview", "aspect_ratio": "16:9", "quality": "high", "size": "4K" }' ``` ### curl (base64 data URL) [#curl-base64-data-url] ```bash curl -X POST "https://api.llmgateway.io/v1/images/edits" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "images": [ { "image_url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..." } ], "prompt": "Turn this into a pixel-art style image" }' ``` ## Chat Completions API [#chat-completions-api] Image generation also works through the `/v1/chat/completions` endpoint, which is useful for conversational image generation, image editing with vision, and multi-turn interactions. ### Making Requests [#making-requests] Simply use an image generation model and provide a text prompt describing the image you want to create. ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-3-pro-image-preview", "messages": [ { "role": "user", "content": "Generate an image of a cute golden retriever puppy playing in a sunny meadow" } ] }' ``` ### Response Format [#response-format] Image generation models return responses in the standard chat completions format, with generated images included in the `images` array within the assistant message: ```json { "id": "chatcmpl-1756234109285", "object": "chat.completion", "created": 1756234109, "model": "gemini-3-pro-image-preview", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Here's an image of a cute dog for you: ", "images": [ { "type": "image_url", "image_url": { "url": "data:image/png;base64," } } ] }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 8, "completion_tokens": 1303, "total_tokens": 1311 } } ``` ### Vision support [#vision-support] You can edit or modify images by combining image generation with [vision models](/features/vision) by including the image in the `messages` array. ### Response Structure [#response-structure] #### Images Array [#images-array] The `images` array contains one or more generated images with the following structure: * `type`: Always `"image_url"` for generated images * `image_url.url`: A data URL containing the base64-encoded image data (format: `data:image/png;base64,`) #### Content Field [#content-field] The `content` field may contain descriptive text about the generated image, depending on the model's behavior. ### AI SDK (Chat Completions) [#ai-sdk-chat-completions] You can use the AI SDK to generate images with your existing generateText or streamText calls using the LLMGateway provider. #### Example [#example] ```ts title="/api/chat/route.ts" import { streamText, type UIMessage, convertToModelMessages } from "ai"; import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; interface ChatRequestBody { messages: UIMessage[]; } export async function POST(req: Request) { const body = await req.json(); const { messages }: ChatRequestBody = body; const llmgateway = createLLMGateway({ apiKey: "llmgateway_api_key", baseUrl: "https://api.llmgateway.io/v1", }); try { const result = streamText({ model: llmgateway.chat("gemini-3-pro-image-preview"), messages: convertToModelMessages(messages), }); return result.toUIMessageStreamResponse(); } catch { return new Response( JSON.stringify({ error: "LLM Gateway Chat request failed" }), { status: 500, }, ); } } ``` Then you can render the image in your frontend using the `Image` component from the [ai-elements](https://ai-sdk.dev/elements/components/image). Here is a full example of how to use the AI SDK to generate images in your frontend: ```tsx title="/app/page.tsx" "use client"; import { useState, useRef } from "react"; import { useChat } from "@ai-sdk/react"; import { parseImagePartToDataUrl } from "@/lib/image-utils"; import { PromptInput, PromptInputBody, PromptInputButton, PromptInputSubmit, PromptInputTextarea, PromptInputToolbar, } from "@/components/ai-elements/prompt-input"; import { Conversation, ConversationContent, } from "@/components/ai-elements/conversation"; import { Image } from "@/components/ai-elements/image"; import { Loader } from "@/components/ai-elements/loader"; import { Message, MessageContent } from "@/components/ai-elements/message"; import { Response } from "@/components/ai-elements/response"; export const ChatUI = () => { const textareaRef = useRef(null); const [text, setText] = useState(""); const { messages, status, stop, regenerate, sendMessage } = useChat(); return ( <>
{messages.length === 0 ? (

How can I help you?

) : ( messages.map((m, messageIndex) => { const isLastMessage = messageIndex === messages.length - 1; if (m.role === "assistant") { const textContent = m.parts .filter((p) => p.type === "text") .map((p) => p.text) .join(""); // Combine all image parts (both image_url and file types) const imageParts = m.parts.filter( (p) => p.type === "file" && p.mediaType?.startsWith("image/"), ); return (
{textContent ? {textContent} : null} {imageParts.length > 0 ? (
{imageParts.map((part, idx: number) => { const { base64Only, mediaType } = parseImagePartToDataUrl(part); if (!base64Only) { return null; } return ( {part.name ); })}
) : null} {isLastMessage && (status === "submitted" || status === "streaming") && ( )}
); } else { return ( {m.parts.map((p, i) => { if (p.type === "text") { return
{p.text}
; } return null; })}
{isLastMessage && (status === "submitted" || status === "streaming") && ( )}
); } }) )}
{ if (status === "streaming") { return; } try { const textContent = message.text ?? ""; if (!textContent.trim()) { return; } setText(""); // Clear input immediately const parts = [{ type: "text", text: textContent }]; // Call sendMessage which will handle adding the user message and API request sendMessage({ role: "user", parts, }); } catch (error) { // Throw error here } }} > setText(e.currentTarget.value)} placeholder="Message" />
{status === "streaming" ? ( stop()} variant="ghost"> Stop ) : null}
); }; ``` ```ts title="/lib/image-utils.ts" /** * Parses a file object containing image data and returns a properly formatted data URL * and normalized media type. * * Handles: * - Normalizing mediaType from various property names (mediaType, mime_type) * - Detecting existing data: URLs * - Detecting base64-looking content * - Stripping whitespace from base64 content * - Building proper data:...;base64,... URLs */ export function parseImageFile(file: { url?: string; mediaType?: string; mime_type?: string; }): { dataUrl: string; mediaType: string } { const mediaType = file.mediaType || file.mime_type || "image/png"; let url = String(file.url || ""); const isDataUrl = url.startsWith("data:"); const looksLikeBase64 = !isDataUrl && /^[A-Za-z0-9+/=\s]+$/.test(url.slice(0, 200)); if (looksLikeBase64) { url = url.replace(/\s+/g, ""); } const dataUrl = isDataUrl ? url : looksLikeBase64 ? `data:${mediaType};base64,${url}` : url; return { dataUrl, mediaType }; } /** * Extracts base64-only content from a data URL. * Returns empty string if the input is not a valid data URL. */ export function extractBase64FromDataUrl(dataUrl: string): string { if (!dataUrl.startsWith("data:")) { return ""; } const comma = dataUrl.indexOf(","); return comma >= 0 ? dataUrl.slice(comma + 1) : ""; } /** * Parses an image part (either image_url or file type) and returns * dataUrl, base64Only, and mediaType ready for rendering. * * Handles error cases gracefully by returning empty base64Only string * when parsing fails, allowing the renderer to skip invalid images. */ export function parseImagePartToDataUrl(part: any): { dataUrl: string; base64Only: string; mediaType: string; } { try { // Handle image_url parts if (part.type === "image_url" && part.image_url?.url) { const url = part.image_url.url; const mediaType = "image/png"; // Default for image_url parts if (url.startsWith("data:")) { // Extract media type from data URL if present const match = url.match(/data:([^;]+)/); const extractedMediaType = match?.[1] || mediaType; return { dataUrl: url, base64Only: extractBase64FromDataUrl(url), mediaType: extractedMediaType, }; } return { dataUrl: url, base64Only: "", mediaType, }; } // Handle file parts (AI SDK format) if (part.type === "file") { const { dataUrl, mediaType } = parseImageFile(part); return { dataUrl, base64Only: extractBase64FromDataUrl(dataUrl), mediaType, }; } return { dataUrl: "", base64Only: "", mediaType: "image/png", }; } catch { return { dataUrl: "", base64Only: "", mediaType: "image/png", }; } } ``` ## Image Configuration [#image-configuration] You can customize the generated image using the optional `image_config` parameter (for chat completions) or `size`/`quality`/`style` parameters (for the images API). The supported parameters vary by provider. ### Google Models [#google-models] Available Google models: | Model | Description | | -------------------------------- | ----------------------------------------------------------------------------------- | | `gemini-3-pro-image-preview` | Gemini 3 Pro with native image generation. Supports aspect ratios and 1K–4K sizes. | | `gemini-3.1-flash-image-preview` | Gemini 3.1 Flash with native image generation. Supports 0.5K–4K sizes (default 1K). | #### gemini-3-pro-image-preview [#gemini-3-pro-image-preview] ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-3-pro-image-preview", "messages": [ { "role": "user", "content": "Generate an image of a mountain landscape at sunset" } ], "image_config": { "aspect_ratio": "16:9", "image_size": "4K" } }' ``` | Parameter | Type | Description | | -------------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------------- | | `aspect_ratio` | string | The aspect ratio of the generated image. Options: `"1:1"`, `"2:3"`, `"3:2"`, `"3:4"`, `"4:3"`, `"4:5"`, `"5:4"`, `"9:16"`, `"16:9"`, `"21:9"` | | `image_size` | string | The resolution of the generated image. Options: `"1K"` (1024x1024), `"2K"` (2048x2048), `"4K"` (4096x4096) | #### gemini-3.1-flash-image-preview [#gemini-31-flash-image-preview] ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-3.1-flash-image-preview", "messages": [ { "role": "user", "content": "Generate an image of a mountain landscape at sunset" } ], "image_config": { "image_size": "1K" } }' ``` | Parameter | Type | Description | | -------------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `aspect_ratio` | string | The aspect ratio of the generated image. Options: `"1:1"`, `"1:4"`, `"1:8"`, `"2:3"`, `"3:2"`, `"3:4"`, `"4:1"`, `"4:3"`, `"4:5"`, `"5:4"`, `"8:1"`, `"9:16"`, `"16:9"`, `"21:9"` | | `image_size` | string | The resolution of the generated image. Options: `"0.5K"` (512x512), `"1K"` (1024x1024, default), `"2K"` (2048x2048), `"4K"` (4096x4096) | `gemini-3.1-flash-image-preview` uniquely supports `"0.5K"` resolution, which is not available on other Google image models. ### Alibaba Models [#alibaba-models] ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "alibaba/qwen-image-plus", "messages": [ { "role": "user", "content": "Generate an image of a mountain landscape at sunset" } ], "image_config": { "image_size": "1024x1536", "n": 1, "seed": 42 } }' ``` | Parameter | Type | Description | | ------------ | ------- | ------------------------------------------------------------------------------------------------ | | `image_size` | string | Image dimensions in `WIDTHxHEIGHT` format. Examples: `"1024x1024"`, `"1024x1536"`, `"1536x1024"` | | `n` | integer | Number of images to generate (1-4) | | `seed` | integer | Random seed for reproducible generation | Available Alibaba models: | Model | Price | Description | | ------------------------- | ------------ | --------------------------------- | | `alibaba/qwen-image` | $0.035/image | Standard quality image generation | | `alibaba/qwen-image-plus` | $0.03/image | Good balance of quality and cost | | `alibaba/qwen-image-max` | $0.075/image | Highest quality image generation | Alibaba models use explicit pixel dimensions (e.g., `"1024x1536"`) instead of aspect ratios. For portrait orientation use `"1024x1536"`, for landscape use `"1536x1024"`. ### Z.AI Models [#zai-models] ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "zai/cogview-4", "messages": [ { "role": "user", "content": "Generate an image of a futuristic city skyline" } ], "image_config": { "image_size": "1024x1024" } }' ``` | Parameter | Type | Description | | ------------ | ------- | ------------------------------------------------------------------------------------------------ | | `image_size` | string | Image dimensions in `WIDTHxHEIGHT` format. Examples: `"1024x1024"`, `"2048x1024"`, `"1024x2048"` | | `n` | integer | Number of images to generate | Available Z.AI models: | Model | Price | Description | | --------------- | ------------ | ------------------------------------------------------------------------------------------------------------------- | | `zai/cogview-4` | $0.01/image | CogView-4 with bilingual support and excellent text rendering | | `zai/glm-image` | $0.015/image | GLM-Image with hybrid auto-regressive architecture, excellent for text-rendering and knowledge-intensive generation | CogView-4 supports both Chinese and English prompts and excels at generating images with embedded text. ### OpenAI Models [#openai-models] ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-image-2", "messages": [ { "role": "user", "content": "Generate a photo-real cinematic landscape at golden hour" } ], "image_config": { "image_size": "3072x2160", "image_quality": "low" } }' ``` | Parameter | Type | Description | | --------------- | ------ | ------------------------------------------------------------------------------------- | | `image_size` | string | Image dimensions in `WIDTHxHEIGHT` format, or `"auto"` to let the model choose. | | `image_quality` | string | One of `"low"`, `"medium"`, `"high"`, or `"auto"`. Defaults to `"auto"` when omitted. | OpenAI image models do **not** accept `aspect_ratio`. Always specify `image_size` as `WIDTHxHEIGHT` (e.g. `"1024x1024"`, `"3072x2160"`). OpenAI requires both width and height to be divisible by 16, the longest edge to be ≤ 3840, and the total pixel count to fit within the model's pixel budget; requests outside these bounds are rejected with HTTP 400. Available OpenAI image models: | Model | Description | | -------------------- | ------------------------------------------------------------------------------------------------------------ | | `openai/gpt-image-2` | OpenAI's next-generation image model with improved quality and prompt adherence, supporting text and vision. | ### ByteDance Models [#bytedance-models] ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "bytedance/seedream-4-5", "messages": [ { "role": "user", "content": "Generate an image of a futuristic cyberpunk city at night" } ], "image_config": { "image_size": "2048x2048" } }' ``` | Parameter | Type | Description | | ------------ | ------ | ------------------------------------------------------------------------------------------------ | | `image_size` | string | Image dimensions in `WIDTHxHEIGHT` format. Examples: `"1024x1024"`, `"2048x2048"`, `"4096x4096"` | Available ByteDance models: | Model | Price | Description | | ------------------------ | ------------ | --------------------------------------------------------------- | | `bytedance/seedream-4-0` | $0.035/image | High-quality text-to-image generation with 2K default output | | `bytedance/seedream-4-5` | $0.045/image | Enhanced quality and consistency with improved prompt adherence | Seedream models support up to 2-10 reference images for multi-image fusion and generation. The default output resolution is 2048×2048 (2K), with support up to 4096×4096 (4K). ## Usage Notes [#usage-notes] Image generation models typically have higher token costs compared to text-only models due to the computational requirements of image synthesis. Generated images are returned as base64-encoded data URLs, which can be large. Consider the payload size when integrating image generation into your applications. # LLM SDK URL: https://docs.llmgateway.io/features/llm-sdk # LLM SDK [#llm-sdk] The LLM SDK lets you drop **AI + in-app credit purchases** into your product the same way Stripe Elements lets you drop in payments. Your end-users get their **own wallet**, buy credits **inside your app**, and chat with any model the gateway supports. LLM Gateway is the merchant of record; you set a markup and keep the margin. It ships as three packages: | Package | Runs in | Use it for | | ---------------------- | ------------------------- | ------------------------------------------------------------------------------------ | | `@llmgateway/server` | Your backend (secret key) | Mint end-user sessions, manage wallets/customers, verify webhooks, trigger payouts | | `@llmgateway/client` | Browser (headless) | Framework-agnostic chat/image/embeddings + balance/top-up, with auto session refresh | | `@llmgateway/elements` | React | Drop-in ``, ``, `` + hooks | A complete, runnable Next.js example lives in the templates repo: [**LLM SDK credits template**](https://github.com/theopenco/llmgateway-templates/tree/main/templates/embeddable-credits). ## How it works [#how-it-works] ``` Your backend ──(secret key sk_)──▶ POST /v1/sessions ──▶ ephemeral session token (es_, ~15 min) │ │ └────────── returns es_ to your frontend ◀────────────────┘ │ Browser (es_ + pk_) ──▶ chat / images / embeddings ──▶ debits the end-user wallet └──▶ buy credits (Stripe Elements) ─▶ credits land in the wallet ``` * Your **secret key** (`sk_…`) never leaves your backend. It mints short-lived **ephemeral session tokens** (`es_…`) scoped to one end-user wallet. * The **browser** only ever holds the `es_…` token (and a publishable Stripe key). It calls the gateway directly; usage is billed to that user's wallet. * **Markup is applied at top-up time**: if you set a 20% markup and a user buys $10, their wallet is credited the net spend power and your **margin accrues to your organization** for later payout. ## Set up in the dashboard [#set-up-in-the-dashboard] Before you write any code, configure the project you want to embed: 1. Open the LLM Gateway dashboard and select your project. 2. Go to **Settings → SDK** and turn on **End-user sessions**. 3. *(Optional)* Set a **markup percent** — the margin you earn on every top-up. 4. Add the browser origins allowed to call the gateway, one per line (e.g. `https://app.example.com`), then click **Save Settings**. 5. Under **Platform Secret Keys**, click **Create Live Key** (or **Create Test Key**) and copy the `sk_…` value immediately. 6. Store it as a server-side environment variable, for example `LLMGATEWAY_SECRET_KEY`. The platform secret key (`sk_…`) is different from a regular gateway API key (`llmgtwy_…`): it mints end-user sessions and must only ever be used from your backend. **Test mode.** A `sk_test_…` key is a sandbox key: end-user wallet top-ups go through Stripe's sandbox (use Stripe [test cards](https://docs.stripe.com/testing), no real charges), and its wallets are fully segregated from live ones — the same end-user gets independent test and live wallets. To keep sandbox money from buying real inference, **test-mode wallets can only call free models**: use the `auto` route (it picks a free model automatically) or a free model id; paid models return a `403`. Pair a test secret key on your backend with `mode="test"` on `` (see below) — the two must match. The platform secret key is shown only once. Do not put it in frontend code, browser bundles, mobile apps, or public repos. ## 1. Install [#1-install] ```bash # backend npm install @llmgateway/server # frontend (pick one) npm install @llmgateway/elements # React drop-in components npm install @llmgateway/client # headless / non-React ``` ## 2. Mint a session on your backend [#2-mint-a-session-on-your-backend] Identify your signed-in user and mint a session bound to their wallet. Scope which models they may call. ```ts // app/api/llmgateway/session/route.ts (Next.js Route Handler) import { LLMGateway } from "@llmgateway/server"; const lg = new LLMGateway({ secretKey: process.env.LLMGATEWAY_SECRET_KEY! }); export async function POST() { const session = await lg.sessions.create({ customer: { externalId: "user_123" }, // your stable user id scope: { models: ["openai/gpt-4o-mini"] }, // lock down what they can call ttlSeconds: 900, // optional, default 15 min }); return Response.json(session); // { sessionToken, walletId, endCustomerId, expiresAt, publishableKey } } ``` Always mint sessions server-side. Never ship your `sk_…` secret key to the browser. ## 3a. Drop in the React components [#3a-drop-in-the-react-components] Wrap your UI in `` and use the components. `fetchSession` is how the client refreshes the short-lived token before it expires. ```tsx "use client"; import { LLMGatewayProvider, Chat, CreditBalance, BuyCredits, } from "@llmgateway/elements"; const fetchSession = () => fetch("/api/llmgateway/session", { method: "POST" }).then((r) => r.json()); export default function Assistant({ session }) { return ( ); } ``` Need full control over rendering? Use the hooks instead of the components: * `useBalance()` → `{ balance, currency, recentLedger, loading, error, refetch, refetchUntilChange }` * `useChat({ model })` → `{ turns, send, streaming, ... }` `useBalance().refetchUntilChange()` polls until the balance actually changes — use it after a purchase, since the wallet is credited asynchronously once the Stripe webhook lands. ## 3b. Or go headless (any framework) [#3b-or-go-headless-any-framework] ```ts import { LLMGatewayClient } from "@llmgateway/client"; const client = new LLMGatewayClient({ session: { token: session.sessionToken, expiresAt: session.expiresAt }, refresh: fetchSession, // auto-refreshes ~60s before expiry }); // stream a completion (billed to the user's wallet) for await (const delta of client.stream({ model: "openai/gpt-4o-mini", messages: [{ role: "user", content: "Hello!" }], })) { process.stdout.write(delta); } const { balance } = await client.getBalance(); ``` The headless client also exposes `chat()`, `image()`, `embeddings()`, `getBalance()`, `createTopUp(amount)`, and `getConfig()`. ## Buying credits [#buying-credits] `` creates a Stripe PaymentIntent scoped to the user's wallet, renders Stripe's `PaymentElement`, and confirms the payment. Once LLM Gateway's webhook processes it, the wallet is credited the **net** amount (after your markup) and your margin accrues to your organization. `@llmgateway/elements` bundles LLM Gateway's browser-safe Stripe publishable keys. Pass `mode="test"` to `` while developing to use Stripe test mode; omit it or pass `mode="prod"` for live payments (`"prod"` is the default). You never need to provide LLM Gateway's Stripe publishable key yourself, and the end-user never sees your `sk_…` secret key. The frontend `mode` prop and the backend secret key must match. A `sk_test_…` key creates the top-up PaymentIntent in the Stripe sandbox, which only the `mode="test"` publishable key can confirm — mixing a test key with `mode="prod"` (or vice versa) makes `` fail to confirm. ## Managing wallets & customers (server-side) [#managing-wallets--customers-server-side] ```ts // grant credits directly (e.g. free trial) await lg.wallets.credit({ walletId, amount: 5, reason: "Signup bonus" }); const wallet = await lg.wallets.retrieve(walletId); // analytics: customers with balances + lifetime spend const { customers } = await lg.customers.list(); const detail = await lg.customers.retrieve(endCustomerId); ``` ## Webhooks [#webhooks] Register an endpoint to react to wallet events. Events are signed (`X-LLMGateway-Signature`); verify them like Stripe. ```ts await lg.webhookEndpoints.create({ url: "https://yourapp.com/webhooks/llmgateway", enabledEvents: ["wallet.credited", "wallet.low_balance"], }); // in your handler const event = lg.webhooks.constructEvent( rawBody, signatureHeader, endpointSecret, ); ``` Webhook URLs must be **https** and public — requests to private/internal addresses are rejected (SSRF protection), both at registration and at delivery time. ## Margin payouts (Stripe Connect) [#margin-payouts-stripe-connect] Your accrued markup is held as a margin balance. Onboard a connected account and pay it out: ```ts const { url } = await lg.connect.createOnboardingLink({ refreshUrl: "https://yourapp.com/settings/payouts", returnUrl: "https://yourapp.com/settings/payouts?done=1", }); // redirect the developer to `url`, then later: const status = await lg.connect.status(); // { onboarded, payoutsEnabled, marginBalance } const payout = await lg.connect.payout(); // transfer the accrued margin out ``` ## Security model [#security-model] * **Ephemeral tokens** (`es_…`) are short-lived and revocable; mint them per-user from your backend. * **Model scopes** restrict each session to an allow-list of models. * **Origin allowlist** (configured on the project) blocks browser calls from unexpected origins. * **Per-session spend caps** (`scope.maxSpend`) bound how much a single session can spend. ## Full example [#full-example] The end-to-end Next.js app — backend session route, provider, chat, and buy-credits — is in the templates repo: ➡️ [**LLM SDK credits template**](https://github.com/theopenco/llmgateway-templates/tree/main/templates/embeddable-credits) # Master Keys URL: https://docs.llmgateway.io/features/master-keys # Master Keys [#master-keys] Master keys are org-scoped bearer tokens that let you create projects and gateway API keys programmatically — without going through the dashboard. They are intended for server-to-server provisioning (e.g. multi-tenant onboarding from your own backend). Master keys are available on the **Enterprise** plan only. Contact us at [contact@llmgateway.io](mailto:contact@llmgateway.io) to enable them for your organization. ## Security [#security] * Master keys are stored as **HMAC-SHA256 hashes** in the database (using the `GATEWAY_API_KEY_HASH_SECRET` secret). The plain token is shown to you **only once** at creation time. * Each master key is scoped to a single organization and cannot access resources in other organizations. * Deleting or deactivating a master key revokes all programmatic access immediately. * All creates/deletes/status changes are recorded in your organization audit log. ## Limits [#limits] * Maximum **10 active master keys per organization**. * Programmatic project and API-key creation enforces the same per-org and per-project limits as the dashboard flow. ## Managing master keys [#managing-master-keys] In the dashboard, go to **Organization → Master Keys**. From there you can: * Create a new master key (the plain token is shown once — copy it immediately). * View the masked token, status, creator, and last-used timestamp for each existing key. * Activate / deactivate or delete keys. ## Authentication [#authentication] All programmatic endpoints live under `/v1/master/*` and require a master key in the `Authorization` header: ``` Authorization: Bearer llmgmk_... ``` A request with a missing, invalid, inactive, or non-enterprise master key receives a 401 / 403 response. ## Endpoints [#endpoints] ### List projects [#list-projects] `GET /v1/master/projects` Returns all non-deleted projects in the master key's organization. ```bash curl https://internal.llmgateway.io/v1/master/projects \ -H "Authorization: Bearer $MASTER_KEY" ``` Response (200): ```json { "projects": [ { "id": "proj_...", "name": "Customer ACME", "organizationId": "org_...", "cachingEnabled": false, "cacheDurationSeconds": 60, "mode": "hybrid", "status": "active", "createdAt": "...", "updatedAt": "..." } ] } ``` ### Create a project [#create-a-project] `POST /v1/master/projects` ```bash curl -X POST https://internal.llmgateway.io/v1/master/projects \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Customer ACME", "cachingEnabled": false, "mode": "hybrid" }' ``` Body parameters: | Field | Type | Description | | ---------------------- | ------------------------------------------------ | -------------------------- | | `name` | string | Project name (1–255 chars) | | `cachingEnabled` | boolean (optional) | Default `false` | | `cacheDurationSeconds` | number (optional) | 10–31536000, default 60 | | `mode` | `"api-keys" \| "credits" \| "hybrid"` (optional) | Default `"hybrid"` | Response (201): the created project. ### Update a project [#update-a-project] `PATCH /v1/master/projects/{id}` Updates a project owned by the master key's organization. All body fields are optional; provide only the ones you want to change. ```bash curl -X PATCH https://internal.llmgateway.io/v1/master/projects/proj_... \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Customer ACME (renamed)", "cachingEnabled": true, "status": "inactive" }' ``` Body parameters (all optional, at least one required): | Field | Type | Description | | ---------------------- | ------------------------------------- | ----------------------------------- | | `name` | string | 1–255 chars | | `cachingEnabled` | boolean | | | `cacheDurationSeconds` | number | 10–31536000 | | `mode` | `"api-keys" \| "credits" \| "hybrid"` | | | `status` | `"active" \| "inactive"` | Toggle the project without deleting | Response (200): the updated project. ### Delete a project [#delete-a-project] `DELETE /v1/master/projects/{id}` Soft-deletes a project (sets `status` to `"deleted"`). Cascades to its API keys. ```bash curl -X DELETE https://internal.llmgateway.io/v1/master/projects/proj_... \ -H "Authorization: Bearer $MASTER_KEY" ``` Response (200): ```json { "message": "Project deleted successfully" } ``` ### Create a gateway API key [#create-a-gateway-api-key] `POST /v1/master/keys` ```bash curl -X POST https://internal.llmgateway.io/v1/master/keys \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "projectId": "proj_...", "description": "Customer ACME — production key" }' ``` Body parameters: | Field | Type | Description | | -------------------------- | ------------------------------------------------- | -------------------------------------------- | | `projectId` | string | Must belong to the master key's organization | | `description` | string | API key description (1–255 chars) | | `usageLimit` | string (optional) | Lifetime usage limit | | `periodUsageLimit` | string (optional) | Recurring period usage limit | | `periodUsageDurationValue` | number (optional) | Required if `periodUsageLimit` is set | | `periodUsageDurationUnit` | `"hour" \| "day" \| "week" \| "month"` (optional) | Required if `periodUsageLimit` is set | The created gateway API key's plain token is returned in the response **only once**. Persist it immediately on your side. Response (201): ```json { "apiKey": { "id": "ak_...", "token": "llmgtwy_...", "description": "Customer ACME — production key", "status": "active", "projectId": "proj_...", "createdBy": "usr_...", "createdAt": "...", "updatedAt": "..." } } ``` ### Update a gateway API key [#update-a-gateway-api-key] `PATCH /v1/master/keys/{id}` Updates an API key in a project owned by the master key's organization. All body fields are optional; provide only the ones you want to change. ```bash curl -X PATCH https://internal.llmgateway.io/v1/master/keys/ak_... \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "status": "inactive", "usageLimit": "100.00" }' ``` Body parameters (all optional, at least one required): | Field | Type | Description | | -------------------------- | -------------------------------------- | -------------------------------------- | | `description` | string | 1–255 chars | | `status` | `"active" \| "inactive"` | | | `usageLimit` | string \| null | Lifetime usage limit (null to clear) | | `periodUsageLimit` | string \| null | Recurring period limit (null to clear) | | `periodUsageDurationValue` | number \| null | Required if `periodUsageLimit` is set | | `periodUsageDurationUnit` | `"hour" \| "day" \| "week" \| "month"` | Required if `periodUsageLimit` is set | Response (200): the updated API key (the plain token is **not** included — it is only returned at creation). ### Delete a gateway API key [#delete-a-gateway-api-key] `DELETE /v1/master/keys/{id}` Soft-deletes the API key (sets `status` to `"deleted"`). Any in-flight requests using the key will be rejected immediately on next auth check. ```bash curl -X DELETE https://internal.llmgateway.io/v1/master/keys/ak_... \ -H "Authorization: Bearer $MASTER_KEY" ``` Response (200): ```json { "message": "API key deleted successfully" } ``` The auto-generated playground API key cannot be deleted via the master API. ## IAM rules [#iam-rules] Each gateway API key can have one or more IAM rules that restrict which models, providers, or pricing tiers it is allowed to use. Rules are evaluated at request time by the gateway. A key with no active rules has no IAM restrictions. Rule types: | `ruleType` | Description | | ----------------- | ----------------------------------------------------------- | | `allow_models` | Only the listed models are permitted | | `deny_models` | The listed models are blocked | | `allow_providers` | Only the listed providers are permitted | | `deny_providers` | The listed providers are blocked | | `allow_pricing` | Only models matching the pricing constraint are permitted | | `deny_pricing` | Models matching the pricing constraint are blocked | | `allow_ip_cidrs` | Only requests from the listed IPv4/IPv6 CIDRs are permitted | | `deny_ip_cidrs` | Requests from the listed IPv4/IPv6 CIDRs are blocked | The `ruleValue` JSON object holds the rule's parameters. The fields it accepts depend on the `ruleType`: | Field | Type | Used by | | ---------------- | ------------------ | ----------------------------------- | | `models` | string\[] | `allow_models`, `deny_models` | | `providers` | string\[] | `allow_providers`, `deny_providers` | | `pricingType` | `"free" \| "paid"` | `allow_pricing`, `deny_pricing` | | `maxInputPrice` | number | `allow_pricing`, `deny_pricing` | | `maxOutputPrice` | number | `allow_pricing`, `deny_pricing` | | `ipCidrs` | string\[] | `allow_ip_cidrs`, `deny_ip_cidrs` | ### IP CIDR rules [#ip-cidr-rules] IP CIDR rules restrict gateway requests by source IP. Both IPv4 (e.g. `192.0.2.0/24`) and IPv6 (e.g. `2001:db8::/32`) ranges are supported, and you can mix both in a single rule. To restrict to a single address, use a `/32` (IPv4) or `/128` (IPv6) prefix. The gateway reads the client IP from the first entry in the `X-Forwarded-For` header, which is set by the GCP load balancer. IPv4-mapped IPv6 addresses (`::ffff:1.2.3.4`) are normalized to IPv4 so a single `1.2.3.0/24` rule still matches when the upstream connection happens to be IPv6. When an `allow_ip_cidrs` rule is configured and the gateway cannot determine the client IP, the request is denied. Invalid CIDR syntax is rejected at rule-creation time with a `400` error. All endpoints scope by the master key's organization: a `404` is returned if the API key (or rule) is not part of the authenticated master key's organization. ### List IAM rules [#list-iam-rules] `GET /v1/master/keys/{id}/iam` ```bash curl https://internal.llmgateway.io/v1/master/keys/ak_.../iam \ -H "Authorization: Bearer $MASTER_KEY" ``` Response (200): ```json { "rules": [ { "id": "iam_...", "apiKeyId": "ak_...", "ruleType": "allow_models", "ruleValue": { "models": ["openai/gpt-4o", "anthropic/claude-3-5-sonnet"] }, "status": "active", "createdAt": "...", "updatedAt": "..." } ] } ``` ### Create an IAM rule [#create-an-iam-rule] `POST /v1/master/keys/{id}/iam` ```bash curl -X POST https://internal.llmgateway.io/v1/master/keys/ak_.../iam \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "ruleType": "allow_models", "ruleValue": { "models": ["openai/gpt-4o", "anthropic/claude-3-5-sonnet"] } }' ``` Body parameters: | Field | Type | Description | | ----------- | ------------------------ | ------------------------------------------------------- | | `ruleType` | rule type enum (above) | Required | | `ruleValue` | object (see table above) | Must include the fields appropriate for the chosen type | | `status` | `"active" \| "inactive"` | Optional, defaults to `"active"` | Restricting by source IP: ```bash curl -X POST https://internal.llmgateway.io/v1/master/keys/ak_.../iam \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "ruleType": "allow_ip_cidrs", "ruleValue": { "ipCidrs": ["192.0.2.0/24", "2001:db8::/32"] } }' ``` Response (201): the created IAM rule. ### Update an IAM rule [#update-an-iam-rule] `PATCH /v1/master/keys/{id}/iam/{ruleId}` All body fields are optional; provide only the ones you want to change. ```bash curl -X PATCH https://internal.llmgateway.io/v1/master/keys/ak_.../iam/iam_... \ -H "Authorization: Bearer $MASTER_KEY" \ -H "Content-Type: application/json" \ -d '{ "status": "inactive" }' ``` Body parameters (all optional, at least one required): | Field | Type | Description | | ----------- | ------------------------ | --------------------------------------- | | `ruleType` | rule type enum (above) | Change the rule type | | `ruleValue` | object (see table above) | Replace the rule value | | `status` | `"active" \| "inactive"` | Activate or deactivate without deleting | Response (200): the updated IAM rule. ### Delete an IAM rule [#delete-an-iam-rule] `DELETE /v1/master/keys/{id}/iam/{ruleId}` Permanently removes an IAM rule from the API key. ```bash curl -X DELETE https://internal.llmgateway.io/v1/master/keys/ak_.../iam/iam_... \ -H "Authorization: Bearer $MASTER_KEY" ``` Response (200): ```json { "message": "IAM rule deleted successfully" } ``` # Metadata URL: https://docs.llmgateway.io/features/metadata # Metadata [#metadata] LLM Gateway supports sending additional metadata with your requests using custom headers. This allows you to include information like user sessions, application versions, tenant IDs, or other contextual data that can be useful for analytics and monitoring. Later, you can filter by specific values to return, such as for a specific user or session. Additionally, in the future, you will be able to segment your analytics and monitoring based on this metadata. For example, you could show cost and latency breakdowns per user, application, country, feature, or any other dimension you want to track. ## Custom Headers [#custom-headers] You can include custom headers with the `X-LLMGateway-` prefix to send metadata alongside your LLM requests: ```bash curl -X POST https://api.llmgateway.io/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "X-LLMGateway-Country: US" \ -H "X-LLMGateway-User-ID: 9403f741-a524-4b18-b1b2-dbb71cdff2a4" \ -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": "Hello, how are you?" } ] }' ``` ## Best Practices [#best-practices] ### Header Naming [#header-naming] * Use the `X-LLMGateway-` prefix for all custom metadata * Use descriptive, consistent naming conventions * Avoid special characters; use hyphens to separate words ### Data Privacy [#data-privacy] * Be mindful of sensitive data in headers * Consider hashing or anonymizing user identifiers * Follow your organization's data privacy policies ### Performance [#performance] * Keep header values reasonably short * Avoid sending unnecessary metadata that won't be used for analytics * Consider the impact on request size, especially for high-volume applications ## Example: Multi-tenant Application [#example-multi-tenant-application] For a multi-tenant application, you might use metadata headers like this: ```bash curl -X POST https://api.llmgateway.io/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "X-LLMGateway-Tenant-ID: acme-corp" \ -H "X-LLMGateway-User-ID: user-12345" \ -H "X-LLMGateway-App-Version: 2.1.4" \ -H "X-LLMGateway-Feature: chat-assistant" \ -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": "Summarize this document..." } ] }' ``` This allows you to track usage and costs per tenant, user, application version, and feature, providing detailed insights into how your LLM integration is being used across your platform. # Moderations URL: https://docs.llmgateway.io/features/moderations # Moderations [#moderations] LLMGateway supports the OpenAI-compatible `/v1/moderations` endpoint for text and multimodal safety classification. Use it when you want to: * Screen user prompts before they reach a model * Review generated output before displaying it * Apply the same moderation API shape you already use with OpenAI clients For the full request and response schema, see the [API reference](/v1/moderations). ## Endpoint [#endpoint] `POST https://api.llmgateway.io/v1/moderations` Authenticate with your LLMGateway API key: ```bash -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" ``` ## Supported Inputs [#supported-inputs] The `input` field accepts: * A single string * An array of strings * An array of multimodal content items with `text` and `image_url` The default model is `omni-moderation-latest`. ## curl [#curl] ### Single text input [#single-text-input] ```bash curl -X POST "https://api.llmgateway.io/v1/moderations" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "input": "I want to harm someone." }' ``` ### Multiple text inputs [#multiple-text-inputs] ```bash curl -X POST "https://api.llmgateway.io/v1/moderations" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "omni-moderation-latest", "input": [ "This is a harmless sentence.", "I want to attack somebody." ] }' ``` ### Multimodal input [#multimodal-input] ```bash curl -X POST "https://api.llmgateway.io/v1/moderations" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "input": [ { "type": "text", "text": "Check this image for violent content." }, { "type": "image_url", "image_url": { "url": "https://example.com/image.png" } } ] }' ``` ## OpenAI SDK [#openai-sdk] ```ts import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.llmgateway.io/v1", apiKey: process.env.LLM_GATEWAY_API_KEY, }); const response = await client.moderations.create({ model: "omni-moderation-latest", input: "I want to harm someone.", }); console.log(response.results[0]?.flagged); ``` ## Response Shape [#response-shape] The response follows the standard OpenAI moderation format: ```json { "id": "modr-123", "model": "omni-moderation-latest", "results": [ { "flagged": true, "categories": { "violence": true, "self_harm": false }, "category_scores": { "violence": 0.98, "self_harm": 0.01 } } ] } ``` ## When To Use This Instead Of Chat Content Filtering [#when-to-use-this-instead-of-chat-content-filtering] Use `/v1/moderations` when you want an explicit moderation decision in your own application flow. If you want moderation to happen automatically as part of model requests, use LLMGateway content filtering on `/v1/chat/completions` instead. # Reasoning URL: https://docs.llmgateway.io/features/reasoning # Reasoning [#reasoning] LLMGateway supports reasoning-capable models that can show their step-by-step thought process before providing a final answer. This feature is particularly useful for complex problem-solving tasks, mathematical calculations, and logical reasoning. ## Reasoning-Enabled Models [#reasoning-enabled-models] You can find all reasoning-enabled models on our [models page with reasoning filter](https://llmgateway.io/models?filters=1\&reasoning=true). These models include: * OpenAI's GPT-5 series (e.g., `gpt-5`, `gpt-5-mini`) * Note: GPT-5 models use reasoning but currently do not return the reasoning content in the response. * Anthropic's Claude 3.7 Sonnet * Google's Gemini 2.0 Flash Thinking and Gemini 2.5 Pro * GPT OSS models such as `gpt-oss-120b` and `gpt-oss-20b` * Z.AI's reasoning models Some models may reason internally even if the `reasoning_effort` parameter is not specified. ## Using the Reasoning Parameter [#using-the-reasoning-parameter] There are two ways to control reasoning effort: ### Option 1: Top-level `reasoning_effort` [#option-1-top-level-reasoning_effort] Add the `reasoning_effort` parameter directly to your request: * `none` - Disable reasoning. Supported by OpenAI's newer reasoning models (e.g. `gpt-5.4-mini` and later, which accept `none` instead of `minimal`). For other providers this turns reasoning off. * `minimal` - Fastest reasoning with minimal thought process (only for GPT-5 models) * `low` - Light reasoning for simpler tasks * `medium` - Balanced reasoning for most tasks * `high` - Deep reasoning for complex problems * `xhigh` - Maximum reasoning depth for the most complex problems OpenAI's reasoning models do not all accept the same effort values. The original GPT-5 models support `minimal`, while newer models (e.g. `gpt-5.4-mini` and later) replace it with `none`. If you send an effort value the target model doesn't support, OpenAI returns an `unsupported_value` error. ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-oss-120b", "messages": [ { "role": "user", "content": "What is 2/3 + 1/4 + 5/6?" } ], "reasoning_effort": "medium" }' ``` ### Option 2: Using the `reasoning` object [#option-2-using-the-reasoning-object] Use the unified `reasoning` configuration object with an `effort` field: * `none` - Disable reasoning * `minimal` - Fastest reasoning with minimal thought process * `low` - Light reasoning for simpler tasks * `medium` - Balanced reasoning for most tasks * `high` - Deep reasoning for complex problems * `xhigh` - Maximum reasoning depth for the most complex problems ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-5", "messages": [ { "role": "user", "content": "What is 2/3 + 1/4 + 5/6?" } ], "reasoning": { "effort": "medium" } }' ``` You cannot use both `reasoning_effort` and `reasoning.effort` in the same request. Choose one approach. However, you can combine `reasoning_effort` or `reasoning.effort` with `reasoning.max_tokens` — when `max_tokens` is specified, it takes priority over the effort level. ### Example Response [#example-response] The response will include a `reasoning` field in the message object containing the model's step-by-step thought process: ```json { "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1234567890, "model": "gpt-oss-120b", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The answer is 1.75 or 7/4.", "reasoning": "First, I need to find a common denominator for 2/3, 1/4, and 5/6. The LCD is 12. Converting: 2/3 = 8/12, 1/4 = 3/12, 5/6 = 10/12. Adding: 8/12 + 3/12 + 10/12 = 21/12 = 1.75 or 7/4." }, "finish_reason": "completed" } ], "usage": { "prompt_tokens": 20, "completion_tokens": 45, "reasoning_tokens": 35, "total_tokens": 65 } } ``` ## Specifying Reasoning Token Budget [#specifying-reasoning-token-budget] For models that support it, you can specify an exact token budget for reasoning using the `reasoning` object with `max_tokens`. This gives you precise control over how many tokens the model allocates to its thinking process. When `reasoning.max_tokens` is specified, it overrides `reasoning.effort` and `reasoning_effort`. Supported by Anthropic Claude and Google Gemini thinking models. ### Example Request [#example-request] ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4-20250514", "messages": [ { "role": "user", "content": "Explain the P vs NP problem and why it matters." } ], "reasoning": { "max_tokens": 8000 } }' ``` ### Supported Models [#supported-models] The `reasoning.max_tokens` parameter is supported by: * **Anthropic Claude**: Claude 3.7 Sonnet, Claude Sonnet 4, Claude Opus 4, Claude Opus 4.5 * **Google Gemini**: Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 3 Pro Preview When using auto-routing or root models with `reasoning.max_tokens`, only providers that support this feature will be considered. ### Provider-Specific Constraints [#provider-specific-constraints] * **Anthropic**: Reasoning budget must be between 1,024 and 128,000 tokens. Values outside this range are automatically clamped. * **Google**: No specific constraints on the reasoning budget. ### Error Handling [#error-handling] If you specify `reasoning.max_tokens` for a model that doesn't support it, you'll receive an error: ```json { "error": { "message": "Model gpt-4o does not support reasoning.max_tokens. Remove the reasoning parameter or use a model that supports explicit reasoning token budgets.", "type": "invalid_request_error", "code": "model_not_supported" } } ``` ## Streaming Reasoning Content [#streaming-reasoning-content] When streaming is enabled, reasoning content will be streamed as part of the response chunks: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-oss-120b", "messages": [ { "role": "user", "content": "Solve this logic puzzle: If all roses are flowers and some flowers fade quickly, can we conclude that some roses fade quickly?" } ], "reasoning_effort": "high", "stream": true }' ``` The reasoning content will appear in the stream chunks before the final answer, allowing you to display the model's thought process in real-time. Example: ``` data: { "id": "chatcmpl-fb266880-1016-4797-9a70-f21a538edaf6", "object": "chat.completion.chunk", "created": 1761048126, "model": "openai/gpt-oss-20b", "choices": [ { "index": 0, "delta": { "reasoning": "It's ", "role": "assistant" }, "finish_reason": null } ] } ``` ## Usage Tracking [#usage-tracking] ### Response Payload [#response-payload] The `usage` object in the response includes reasoning-specific token counts: * `reasoning_tokens` - Number of tokens used for the reasoning process * `completion_tokens` - Number of tokens in the final answer * `prompt_tokens` - Number of tokens in the input * `total_tokens` - Sum of all token counts ### Logs and Analytics [#logs-and-analytics] All requests using the `reasoning_effort` parameter are tracked in your dashboard logs with: * The `reasoningContent` field containing the full reasoning text * Separate token counts for reasoning vs. completion * Performance metrics for reasoning-enabled requests You can view detailed logs for each request in the [dashboard](https://llmgateway.io/dashboard) to analyze how models are reasoning through problems. ## Auto-Routing with Reasoning [#auto-routing-with-reasoning] When using auto-routing (specifying a model like `gpt-5` without a specific version), LLMGateway will: 1. Automatically set `reasoning_effort` to `minimal` for GPT-5 models 2. Set `reasoning_effort` to `low` for other auto-routed reasoning models 3. Only route to providers that support reasoning when `reasoning_effort` is specified This ensures optimal performance and cost when using auto-routing with reasoning-capable models. ## Model-Specific Behavior [#model-specific-behavior] Not all reasoning models return reasoning content in the same way. Some models (like OpenAI models) may reason internally but not expose the reasoning content in the response. LLMGateway makes sure the response is unified across different providers, but the depth and format of reasoning may vary. ## Best Practices [#best-practices] 1. **Choose appropriate reasoning effort**: Use `low` or `minimal` for simple tasks, `medium` for most tasks, and `high` only for complex problems that require deep reasoning 2. **Monitor token usage**: Reasoning can significantly increase token consumption - monitor your `reasoning_tokens` in the usage object 3. **Stream for better UX**: When building user-facing applications, enable streaming to show the reasoning process in real-time 4. **Check logs**: Review the `reasoningContent` in your dashboard logs to understand how models are solving problems ## Error Handling [#error-handling-1] If you specify `reasoning_effort` for a model that doesn't support reasoning, you'll receive an error: ```json { "error": { "message": "Model gpt-4o does not support reasoning. Remove the reasoning_effort parameter or use a reasoning-capable model.", "type": "invalid_request_error", "code": "model_not_supported" } } ``` To avoid this error, only use the `reasoning_effort` parameter with [reasoning-enabled models](https://llmgateway.io/models?filters=1\&reasoning=true). # Response Healing URL: https://docs.llmgateway.io/features/response-healing # Response Healing [#response-healing] Response Healing is a plugin that automatically validates and repairs malformed JSON responses from AI models. When enabled, LLM Gateway ensures that API responses conform to your specified schemas even when the model's formatting is imperfect. ## Why Response Healing? [#why-response-healing] Large language models occasionally produce invalid JSON, especially in complex scenarios: * **Markdown wrapping**: Models often wrap JSON in code blocks like \`\`\`json...\`\`\` * **Mixed content**: JSON may be preceded or followed by explanatory text * **Syntax errors**: Trailing commas, unquoted keys, or single quotes instead of double quotes * **Truncated output**: Token limits may cut off responses mid-JSON Response Healing automatically detects and fixes these issues, saving you from implementing error handling for every possible malformed response. ## Enabling Response Healing [#enabling-response-healing] To enable Response Healing, add `response-healing` to the `plugins` array in your request: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Return a JSON object with name and age"}], "response_format": {"type": "json_object"}, "plugins": [{"id": "response-healing"}] }' ``` Response Healing only activates when `response_format` is set to `json_object` or `json_schema`. For regular text responses, the plugin has no effect. ## How It Works [#how-it-works] When Response Healing is enabled, LLM Gateway applies a series of repair strategies to malformed JSON responses: ### 1. Markdown Extraction [#1-markdown-extraction] Extracts JSON from markdown code blocks: ```text Here's the data: \`\`\`json {"name": "Alice", "age": 30} \`\`\` ``` Becomes: ```json { "name": "Alice", "age": 30 } ``` ### 2. Mixed Content Extraction [#2-mixed-content-extraction] Separates JSON from surrounding text: ```text Sure! Here is the JSON you requested: {"name": "Alice", "age": 30} Let me know if you need anything else. ``` Becomes: ```json { "name": "Alice", "age": 30 } ``` ### 3. Syntax Fixes [#3-syntax-fixes] Repairs common JSON syntax violations: | Issue | Before | After | | --------------- | ------------------- | ------------------- | | Trailing commas | `{"a": 1,}` | `{"a": 1}` | | Unquoted keys | `{name: "Alice"}` | `{"name": "Alice"}` | | Single quotes | `{'name': 'Alice'}` | `{"name": "Alice"}` | ### 4. Truncation Completion [#4-truncation-completion] Adds missing closing brackets for truncated responses: ```text {"name": "Alice", "data": {"nested": true ``` Becomes: ```json { "name": "Alice", "data": { "nested": true } } ``` ## Usage Examples [#usage-examples] ### With JSON Object Format [#with-json-object-format] Request a structured response with automatic healing: ```typescript const response = await fetch("https://api.llmgateway.io/v1/chat/completions", { method: "POST", headers: { Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "gpt-4o", messages: [ { role: "user", content: "Return a JSON object with fields: name (string) and age (number)", }, ], response_format: { type: "json_object" }, plugins: [{ id: "response-healing" }], }), }); const result = await response.json(); // Response is guaranteed to be valid JSON const data = JSON.parse(result.choices[0].message.content); ``` ### With JSON Schema [#with-json-schema] For stricter validation, combine with `json_schema`: ```typescript const response = await fetch("https://api.llmgateway.io/v1/chat/completions", { method: "POST", headers: { Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "gpt-4o", messages: [ { role: "user", content: "Generate a user profile", }, ], response_format: { type: "json_schema", json_schema: { name: "user_profile", schema: { type: "object", required: ["name", "email"], properties: { name: { type: "string" }, email: { type: "string" }, age: { type: "number" }, }, }, }, }, plugins: [{ id: "response-healing" }], }), }); const result = await response.json(); ``` ## Healing Metadata [#healing-metadata] When a response is healed, the healing method is logged for debugging. The following healing methods may be applied: | Method | Description | | -------------------------- | ------------------------------------------- | | `markdown_extraction` | JSON extracted from markdown code blocks | | `mixed_content_extraction` | JSON extracted from surrounding text | | `syntax_fix` | Trailing commas, quotes, or keys were fixed | | `truncation_completion` | Missing closing brackets were added | | `combined_strategies` | Multiple strategies were applied | ## Limitations [#limitations] Response Healing is only available for non-streaming requests. Streaming responses are returned as-is without healing. Response Healing works best for: * Simple to moderately complex JSON structures * Common formatting issues from LLMs It may not be able to repair: * Severely corrupted or nonsensical output * Complex nested structures with multiple issues * Responses that don't contain any recognizable JSON ## Best Practices [#best-practices] ### Use with Structured Prompts [#use-with-structured-prompts] Combine Response Healing with clear instructions for best results: ```typescript const response = await fetch("https://api.llmgateway.io/v1/chat/completions", { method: "POST", headers: { Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "gpt-4o", messages: [ { role: "system", content: "Always respond with valid JSON. No explanations.", }, { role: "user", content: "List three colors as a JSON array", }, ], response_format: { type: "json_object" }, plugins: [{ id: "response-healing" }], }), }); const result = await response.json(); ``` ### Validate Critical Data [#validate-critical-data] For critical applications, validate the healed JSON in your code: ```typescript const result = await response.json(); const content = result.choices[0].message.content; const data = JSON.parse(content); // Add your own validation if (!data.name || typeof data.name !== "string") { throw new Error("Invalid response: missing name"); } ``` ### Monitor Healing Rates [#monitor-healing-rates] If you notice frequent healing in your logs, consider: * Improving your prompts to request cleaner JSON * Using models with better JSON output (e.g., GPT-4o, Claude 3.5) * Adding explicit JSON examples in your prompts # Routing URL: https://docs.llmgateway.io/features/routing # Routing [#routing] LLMGateway provides flexible and intelligent routing options to help you get the best performance and cost efficiency from your AI applications. Whether you want to use specific models, providers, or let our system automatically optimize your requests, we've got you covered. LLMGateway also includes **automatic retry and fallback** — if a provider fails, your request is seamlessly retried on the next best provider, all within the same API call. ## Model Selection [#model-selection] ### Any Model Name [#any-model-name] You can use any model name from our [models page](https://llmgateway.io/models) or discover available models programmatically through the [/v1/models endpoint](/v1_models). ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }' ``` ### Model ID Routing [#model-id-routing] Choose a specific model ID to route to the **best available provider** for that model. LLMGateway's smart routing algorithm considers multiple factors to find the optimal provider across all configured options. #### Smart Routing Algorithm [#smart-routing-algorithm] When you use a model ID without a provider prefix, LLMGateway's intelligent routing system analyzes multiple factors to select the best provider. **Weighted Scoring System**: Each factor has a **relative weight**. The factors are scored as ratios against the best provider in the candidate set (e.g. a provider that is twice as expensive as the cheapest scores `1.0` on price), and each ratio is multiplied by its weight divided by the sum of all active weights. The provider with the lowest (best) total score wins. The default weights are: | Factor | Default weight | Notes | | --------------- | -------------- | -------------------------------------------------------------------------- | | **Price** | `0.6` | Cost efficiency (average of input and output price) | | **Uptime** | `0.5` | Provider reliability / low error rate | | **Throughput** | `0.05` | Tokens per second generation speed | | **Latency** | `0.025` | Time to first token — **only applied for streaming requests** | | **Cache** | `0.2` | Prompt-cache support — **only applied for large prompts** (≥ 5,000 tokens) | | **Image price** | `1.0` | Replaces the price weight for image-generation models | Because the weights are relative and normalized by the sum of the active weights, price and uptime dominate routing decisions in practice, while throughput and latency act as tie-breakers between otherwise comparable providers. **Latency Weight for Non-Streaming Requests**: The latency weight only applies to streaming requests (time-to-first-token is only measured there). For non-streaming requests the latency weight is dropped and its share is redistributed proportionally across the remaining factors. **Time-Decayed Metrics Window**: Provider metrics (uptime, throughput, latency) are not a flat "last N minutes" snapshot. They are aggregated over a rolling **60-minute window** with a time-decay weighting so very recent behavior dominates while older data still contributes: * The most recent **1 minute** is weighted **10×** * The most recent **5 minutes** are weighted **3×** * The remainder of the 60-minute window is weighted **1×** This makes routing react quickly to a provider that just started failing or slowing down, without overreacting to a single noisy data point. **Cache Support for Large Prompts**: When the estimated prompt is at least 5,000 tokens, the **cache weight** (default `0.2`) is factored into the score based on whether each provider supports prompt caching (advertised via a cached input price). Providers that support caching score better than ones that do not, since caching can substantially reduce the cost of large or repeated prompts. Below the 5,000-token threshold, this weight is dropped entirely — caching has little impact on small prompts, so cache support is ignored. The selected provider's cache support is exposed as `cacheSupported` on the routing metadata. **Exponential Uptime Penalty**: Providers with uptime below 95% receive an additional exponential penalty that increases rapidly as uptime drops: * 95-100% uptime: No penalty * 90% uptime: \~0.07 penalty * 80% uptime: \~0.62 penalty * 70% uptime: \~1.73 penalty * 50% uptime: \~5.61 penalty This ensures providers experiencing significant issues are strongly deprioritized while minor fluctuations have minimal impact. The penalty threshold (default `95%`) is configurable. **Provider Priority**: Each provider has a **priority** value (default `1`) that nudges routing toward or away from it independently of live metrics: * A provider's priority is applied as a `(1 - priority)` adjustment to its score — higher priority lowers the score (more preferred), lower priority raises it (less preferred). * A priority of **0** disables the provider entirely, removing it from routing for that model. Provider priorities are surfaced in the routing metadata so you can see how they influenced a decision. **Epsilon-Greedy Exploration** (1% of requests by default): To solve the "cold start problem" where new or unused providers never get traffic to build up metrics, the system randomly explores different providers a small fraction of the time (default 1%, configurable). This ensures: * All providers periodically receive traffic * New providers can prove their reliability * The system adapts to changing provider performance * You benefit from improved routing decisions over time The exploration rate is configurable per project through the routing configuration (`thresholds.explorationRate`), and self-hosted deployments can override it globally with the `EXPLORATION_RATE` environment variable (a number between `0` and `1`). **Stable Provider Preference**: To avoid unnecessary churn between providers that score similarly, LLMGateway remembers the best provider chosen for each model and sticks with it across requests — even if another provider edges ahead slightly on the next score calculation. On every routing decision, the system checks whether the previously selected provider is still acceptable: * **Uptime hard switch**: if the preferred provider's uptime drops below **85%**, routing switches to the current best-scoring provider immediately. * **Score margin soft switch**: the preferred provider is replaced only when a better option's score is more than **0.15** ahead. Small fluctuations caused by metric noise or minor price differences do not trigger a switch. * **Periodic re-evaluation**: the preference expires after **1 hour**, at which point the next request picks the best-scoring provider fresh and stores it as the new preferred. Requests that are part of the epsilon-greedy exploration bypass this preference entirely so that all providers continue to receive periodic traffic and build up metrics. The selection reason in routing metadata will show `stable-preferred` when a request was served by the stored preference rather than the top-scored provider at that moment. Self-hosted deployments can tune this behavior with three environment variables: `PREFERRED_PROVIDER_TTL` (preference lifetime in seconds, default `3600`), `PREFERRED_PROVIDER_UPTIME_THRESHOLD` (hard-switch uptime floor, default `85`), and `PREFERRED_PROVIDER_SCORE_MARGIN` (soft-switch score gap, default `0.15`). On the **Enterprise plan**, these same values can be customized per project from the dashboard — see [Per-Project Routing Configuration](#per-project-routing-configuration-enterprise). **Routing Metadata**: Every request includes detailed routing metadata in the logs, showing: * Available providers that were considered * Selected provider and selection reason * Scores for each provider (including uptime, throughput, latency, price, priority, and cache support) This transparency allows you to understand and debug routing decisions. Using model IDs without a provider prefix automatically routes to the optimal provider based on reliability, speed, and cost. The system continuously learns and adapts based on real-time performance metrics. Smart routing prioritizes reliability over cost, ensuring your requests are routed to providers with proven uptime and performance, while still considering cost efficiency. ### Sticky Session Routing [#sticky-session-routing] When a model is served by multiple providers, every request is normally scored independently — so a multi-turn conversation can bounce between providers. That defeats provider-side **prompt caching**, which only pays off when consecutive requests with a shared prefix hit the **same** provider. Sticky session routing solves this: attach a session identifier and LLMGateway pins all requests for that session to a single provider (and region), keeping the upstream prompt cache warm across the whole conversation. #### Setting the session id [#setting-the-session-id] For chat completions, the session key is resolved in priority order: 1. The `x-session-id` header 2. The `prompt_cache_key` body field (OpenAI-compatible) 3. The `user` body field (OpenAI-compatible) ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -H "x-session-id: conversation-9f8e7d6c" \ -d '{ "model": "claude-sonnet-4-6", "messages": [{"role": "user", "content": "Hello!"}] }' ``` For the Anthropic Messages endpoint (`/v1/messages`), the session key is derived automatically from `metadata.user_id` — coding agents such as Claude Code embed the session id there — and forwarded internally. An explicit `x-session-id` header still takes precedence. #### How pinning works [#how-pinning-works] On a session's **first** request the provider is chosen by the normal weighted smart-routing score — the same price-, priority-, uptime-, and throughput-aware algorithm used for non-sticky requests. That choice is then **persisted for the session** and reused on every subsequent request, so the upstream prompt cache stays warm without bouncing the conversation between providers. Because the pinned provider is replayed directly, sticky requests **skip the epsilon-greedy exploration** — a session is never randomly bounced to a different provider mid-conversation. #### Falling back when a provider is down [#falling-back-when-a-provider-is-down] An established pin yields only when its provider can no longer serve the session well. A session is re-scored and re-pinned to the current weighted-best provider when its provider: * Drops below the session uptime threshold (default 85%), * Is filtered out by health checks (e.g. excluded for low uptime), or * Fails the request and is dropped by the [automatic retry & fallback](#automatic-retry--fallback) loop. Re-pinning runs the same weighted algorithm again, so the replacement is the best currently available provider — not an arbitrary one. The selection reason in routing metadata shows `session-sticky` when a request was pinned via a session id. Sticky routing optimizes for cache locality over per-request churn. Once a session is pinned it stays on its provider even if a cheaper or faster alternative becomes momentarily available, since the prompt-cache savings typically outweigh the difference — but the initial pick still respects price and priority. Requests without a session id are unaffected and continue to use the weighted smart-routing algorithm. ### Provider-Specific Routing [#provider-specific-routing] To use a specific provider without any fallbacks, prefix the model name with the provider name followed by a slash: ```bash # Use OpenAI specifically curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }' # Use DeepSeek provider specifically curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek/deepseek-v3.2", "messages": [{"role": "user", "content": "Hello!"}] }' ``` #### Regions [#regions] Some providers expose the same model in multiple regions. In that case, LLMGateway supports two routing modes: * `provider/model` selects the best eligible region for that provider using the same routing inputs used elsewhere: recent uptime, throughput, latency, and price * `provider/model:region` pins the request to one exact region ```bash # Let LLMGateway choose the best Alibaba region for DeepSeek V3.2 curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "alibaba/deepseek-v3.2", "messages": [{"role": "user", "content": "Hello!"}] }' # Force a specific Alibaba region curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "alibaba/deepseek-v3.2:cn-beijing", "messages": [{"role": "user", "content": "Hello!"}] }' ``` If your provider key stores an explicit region, that region acts like a lock and LLMGateway will only use that region for provider-specific requests. If no explicit region is configured on the provider key, provider-specific requests can still score all eligible regions for that provider. Routing metadata reflects this: * Dynamic provider-region selection shows all eligible regional scores that were considered * Explicitly pinned regions show only the pinned region in the score list Region-aware routing only compares regions that are actually available for the current project mode and provider setup. In credits mode, that means only regions backed by configured environment keys. In API keys and hybrid mode, an explicit provider-key region restricts the request to that region. #### Low-Uptime Protection [#low-uptime-protection] When you specify a provider explicitly, LLMGateway checks the provider's recent uptime (from the time-decayed metrics window described above). If the uptime falls below 90%, the system automatically routes your request to the best available alternative provider to ensure reliability. This protects your application from providers experiencing temporary issues. The fallback threshold (default `90%`) is configurable. If the requested provider has low uptime but no alternative providers are available for that model, the request will still be sent to the originally requested provider. #### Disabling Fallback with X-No-Fallback Header [#disabling-fallback-with-x-no-fallback-header] If you need to bypass this protection and always use the exact provider you specified regardless of its current uptime, you can use the `X-No-Fallback` header: ```bash # Force use of a specific provider even if it has low uptime curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -H "X-No-Fallback: true" \ -d '{ "model": "openai/gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }' ``` Using `X-No-Fallback: true` disables automatic provider failover. Your requests will be sent to the specified provider even if it is experiencing issues, which may result in higher error rates. Retries may still occur against another key for the same provider when multiple keys are configured. When the `X-No-Fallback` header is used, the routing metadata in logs will include `noFallback: true` to indicate that fallback was disabled for that request. ## Automatic Retry & Fallback [#automatic-retry--fallback] When using model ID routing (without a provider prefix), LLMGateway automatically retries failed requests on alternate providers. This happens transparently within the same API call — your application receives the successful response as if nothing went wrong. ### How Retry Works [#how-retry-works] 1. Your request is routed to the best available provider using the smart routing algorithm 2. If that provider returns a server error (5xx), times out, or has a connection failure, the gateway marks the provider as failed 3. The next best available provider is selected and the request is retried 4. Up to **2 retries** are attempted before returning an error to the client ``` Request → Provider A (500 error) → Provider B (200 OK) → Response ``` Both streaming and non-streaming requests support automatic retry. ### What Triggers a Retry [#what-triggers-a-retry] Retries are triggered by **server-side failures** only: * **5xx errors** (500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, etc.) * **Timeouts** (upstream provider took too long to respond) * **Connection failures** (network errors, DNS failures, etc.) Retries are **not** triggered by: * **4xx client errors** (400 Bad Request, 401 Unauthorized, 403 Forbidden, 422 Unprocessable Entity) * **Content filter responses** (Azure ResponsibleAI, etc.) ### When Retry Is Disabled [#when-retry-is-disabled] Automatic retry to a different provider is disabled when: * The `X-No-Fallback: true` header is set * A specific provider is requested (e.g., `openai/gpt-4o`) * No alternative providers are available for the requested model * The maximum retry count (2) has been exhausted Retries can still happen within the same provider when multiple keys are configured and the current key fails with a retryable error. ### Routing Transparency [#routing-transparency] Every provider attempt — both failed and successful — is recorded in the `routing` array in the response metadata and activity logs: ```json { "metadata": { "routing": [ { "provider": "openai", "model": "gpt-4o", "status_code": 500, "error_type": "server_error", "succeeded": false }, { "provider": "azure", "model": "gpt-4o", "status_code": 200, "error_type": "none", "succeeded": true } ] } } ``` ### Retried Log Tracking [#retried-log-tracking] Each provider attempt creates its own log entry. Failed attempts that were retried are marked with: * **`retried: true`** — indicates this failed request was retried on another provider * **`retriedByLogId`** — the ID of the final successful log entry This allows you to distinguish between unrecovered failures and failures that were transparently recovered via retry. In the dashboard, retried logs display a "Retried" badge with a link to the successful log. ### Impact on Provider Health [#impact-on-provider-health] Failed attempts still count against the provider's uptime score, even when the request was successfully retried on another provider. This means: * A provider that keeps failing will see its uptime score drop * The exponential uptime penalty kicks in below 95% (see [Smart Routing Algorithm](#smart-routing-algorithm)) * Future requests are automatically routed away from unreliable providers * Your application stays reliable without any code changes on your side Automatic retry and fallback works together with smart routing to provide self-healing behavior. Failing providers are automatically avoided, and your requests are transparently recovered on reliable alternatives. ## Per-Project Routing Configuration (Enterprise) [#per-project-routing-configuration-enterprise] The values described above — scoring weights, thresholds, retry behavior, the metrics window, sticky-routing, and per-provider priorities — are the **defaults** that apply to every project. On the **Enterprise plan**, you can override any of them **per project** from the dashboard under **Project Settings → Routing**. Projects on other plans always use the defaults. Overrides are merged on top of the defaults, so you only set the values you want to change. When a custom configuration is disabled, the project falls back to the defaults. The following groups can be customized per project: | Group | What it controls | Defaults | | ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- | | **Weights** | Relative importance of each scoring factor | `price 0.6`, `imagePrice 1.0`, `uptime 0.5`, `throughput 0.05`, `latency 0.025`, `cache 0.2` | | **Thresholds** | Cache prompt-size threshold, uptime-penalty threshold, exploration rate, and the assumed defaults used when no metrics exist | `cachePromptTokens 5000`, `uptimePenalty 95`, `defaultUptime 100`, `defaultLatency 1000`, `defaultThroughput 50`, `explorationRate 0.01` | | **Retry** | Max cross-provider fallback attempts and the low-uptime reroute threshold | `maxRetries 2`, `lowUptimeFallbackThreshold 90` | | **Timeouts** | Per-request time limits (end-to-end, streaming, non-streaming). Capped at the infrastructure defaults — an override can only lower them | `gatewayMs 1,500,000`, `streamingMs 1,200,000`, `plainMs 600,000` | | **History** | The metrics window and the time-decay tier boundaries and weights | `windowMinutes 60` (max 120), `tier1Minutes 1`, `tier2Minutes 5`, `tier1Weight 10`, `tier2Weight 3`, `tier3Weight 1` | | **Sticky** | Stable-provider preference: on/off, TTL, hard-switch uptime floor, soft-switch score margin | `enabled true`, `ttlSeconds 3600`, `uptimeThreshold 85`, `scoreMargin 0.15` | | **Provider priorities** | Per-provider priority multipliers; set a provider to `0` to disable it for that project | `1` for every provider | Per-project routing configuration requires the Enterprise plan. If you'd like to tune routing for your workloads, contact us at [contact@llmgateway.io](mailto:contact@llmgateway.io). ## Optimized Auto Routing [#optimized-auto-routing] Auto routing automatically selects the best model for your specific use case without you having to specify a model at all. ### Current Implementation [#current-implementation] The auto routing system currently: * **Chooses cost-effective models** by default for optimal price-to-performance ratio * **Automatically scales to more powerful models** based on your request's context size * **Handles large contexts intelligently** by selecting models with appropriate context windows ```bash # Let LLMGateway choose the optimal model curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [{"role": "user", "content": "Your request here..."}] }' ``` ### Free Models Only [#free-models-only] When using auto routing, you can restrict the selection to only free models (models with zero input and output pricing) by setting the `free_models_only` parameter to `true`: ```bash # Auto route to free models only curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [{"role": "user", "content": "Hello!"}], "free_models_only": true }' ``` Adding even a small amount of credits to your account (e.g., $10) will immediately upgrade your free model rate limits from 5 requests per 10 minutes to 20 requests per minute. The `free_models_only` parameter only works with auto routing (`"model": "auto"`). If no free models are available that meet your request requirements, the API will return an error. ### Reasoning models only [#reasoning-models-only] Just specify the `reasoning_effort` value and only a model which supports reasoning will be chosen. This parameter is not specific to the auto model. ```bash # Auto route only to reasoning models curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [{"role": "user", "content": "Hello!"}], "reasoning_effort": "medium" }' ``` ### Exclude Reasoning Models [#exclude-reasoning-models] When using auto routing, you can exclude reasoning models from selection by setting the `no_reasoning` parameter to `true`. This is useful when you want faster responses or need to avoid the additional cost and latency of reasoning models: ```bash # Auto route excluding reasoning models curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [{"role": "user", "content": "Hello!"}], "no_reasoning": true }' ``` The `no_reasoning` parameter only works with auto routing (`"model": "auto"`). If no non-reasoning models are available that meet your request requirements, the API will return an error. Auto routing analyzes your payload and automatically chooses between cost-effective models for simple requests and more powerful models for complex or large-context requests. ### Coming Soon: Advanced Optimization [#coming-soon-advanced-optimization] We're continuously improving our auto routing capabilities. Soon you'll benefit from: * **Tool call optimization**: Automatically select models that excel at function calling and structured outputs * **Content-aware routing**: Analyze message content to determine the best model for specific types of requests (coding, creative writing, analysis, etc.) * **Performance-based routing**: Route based on historical performance data for similar requests * **Multi-model orchestration**: Intelligently combine multiple models for complex workflows ### How It Works [#how-it-works] 1. **Request Analysis**: The system analyzes your request including message content, context size, and any special parameters 2. **Model Selection**: Based on the analysis, it selects the most appropriate model considering cost, performance, and capabilities 3. **Transparent Routing**: Your request is seamlessly routed to the chosen model and provider 4. **Optimized Response**: You receive the best possible response while maintaining cost efficiency Auto routing decisions are transparent in your usage logs, so you can always see which model was selected for each request. ## Best Practices [#best-practices] ### For Development [#for-development] * Use specific model names during development and testing * Leverage auto routing for production workloads to optimize costs ### For Production [#for-production] * Use auto routing (`"model": "auto"`) for the best balance of cost and performance * Monitor your usage patterns through the dashboard to understand routing decisions * Set up provider keys for multiple providers to maximize routing options ### For Cost Optimization [#for-cost-optimization] * Let auto routing handle model selection to automatically use the most cost-effective options * Use model IDs without provider prefixes to always get the cheapest available provider * Monitor your usage analytics to track cost savings from intelligent routing # Service Tiers URL: https://docs.llmgateway.io/features/service-tiers # Service Tiers [#service-tiers] Some OpenAI and Google models support selectable **processing tiers** that trade latency and availability against price. You pick one per request with the OpenAI-compatible `service_tier` parameter, and LLM Gateway forwards it only when the selected provider/model mapping supports that tier. | Tier | `service_tier` | Cost vs. standard | Latency / availability | | ------------ | ------------------------- | ----------------- | ------------------------------------------- | | Standard | `default` / `auto` / omit | baseline | Normal on-demand latency | | **Flex** | `flex` | **−50%** | Best-effort; may be preempted under load | | **Priority** | `priority` | varies by model | Prioritized above standard and flex traffic | ## Using the `service_tier` parameter [#using-the-service_tier-parameter] ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "google-vertex/gemini-2.5-pro", "service_tier": "priority", "messages": [ { "role": "user", "content": "Summarize this incident report." } ] }' ``` Accepted values are `flex`, `priority`, and `default`/`auto` (standard). If you request `flex` or `priority` for a provider/model mapping that does not support that tier, the gateway returns a 400 `unsupported_service_tier` error and logs the request as a client error. ## Supported providers [#supported-providers] Service tiers are explicit per provider/model mapping. Check the model page for the exact tiers exposed by each provider card. * **OpenAI** (`openai`) — sent as the OpenAI `service_tier` request field for supported OpenAI models. Flex is billed at 0.5x standard token prices and Priority uses the model-specific multiplier shown on the model page. * **Google Vertex AI** (`google-vertex`) — sent as the `X-Vertex-AI-LLM-Shared-Request-Type` request header. Flex and Priority are served only on the **global** endpoint, which is the gateway default. Google Flex PayGo applies a 0.5x multiplier; Google Priority PayGo applies a 1.8x multiplier. * **Google AI Studio / Gemini API** (`google-ai-studio`) — sent as a `service_tier` field in the request body for configured models that opt in. Tiers are supported on a **subset** of models, and the Flex and Priority subsets differ by provider. For example, Google Flex PayGo lists Gemini 3 image / Nano Banana models, but Google Priority PayGo does not; those configured image mappings are Flex-only. ## Pricing uses multipliers [#pricing-uses-multipliers] Service tiers do not define separate model prices in LLM Gateway. They multiply the provider mapping's standard token prices: * Standard / `default` / `auto`: 1x * Flex: 0.5x * Priority: model/provider-specific, shown on the model page The multiplier scales per-token costs, including input, output, cached, and image tokens. Flat per-request and web-search fees are not tier-scaled. ## Billing follows the served tier [#billing-follows-the-served-tier] When a provider reports the tier that was actually served, LLM Gateway bills that returned tier instead of blindly billing the requested value: * A `priority` request that runs as priority is billed at 2.5x. * A `flex` request that runs as flex is billed at 0.5x. * A request that is served as standard is billed at the standard 1x rate. The served tier is read back from the provider response — Vertex reports it in `usageMetadata.trafficType` (`ON_DEMAND_PRIORITY` / `ON_DEMAND_FLEX` / `ON_DEMAND`), Google AI Studio reports it in the `x-gemini-service-tier` response header, and OpenAI can return `service_tier` in response payloads or stream events. LLM Gateway rejects unsupported tier requests before provider routing. For example, `gemini-3-pro-image-preview` currently exposes Flex for Google AI Studio and Vertex, but not Priority. You can see per-tier pricing for each model on its [model page](https://llmgateway.io/models). Supported provider cards include a Service Tier selector in the card header and show the active multiplier next to each tier. ## Sources [#sources] * [OpenAI API pricing](https://openai.com/api/pricing/) * [Google Flex PayGo](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/flex-paygo) * [Google Priority PayGo](https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/priority-paygo) # Sessions URL: https://docs.llmgateway.io/features/sessions # Sessions [#sessions] A **session** ties together the requests that belong to the same conversation or workflow. By attaching a stable session identifier to your requests, LLMGateway can treat them as a unit — keeping provider routing consistent across turns and letting you trace and filter the whole conversation in the dashboard. Sessions are the foundation for several features. Today they power **sticky provider routing** and **session-level observability**; more session-scoped capabilities will build on the same identifier over time. ## Setting the session id [#setting-the-session-id] For chat completions, the session key is resolved in priority order — the first present value wins: 1. The `x-session-id` header 2. The `x-session-affinity` header (sent automatically by coding agents such as opencode) 3. The `prompt_cache_key` body field (OpenAI-compatible) 4. The `user` body field (OpenAI-compatible) ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -H "x-session-id: conversation-9f8e7d6c" \ -d '{ "model": "claude-sonnet-4-6", "messages": [{"role": "user", "content": "Hello!"}] }' ``` Reuse the same session id for every request in a conversation. If you don't set any of the values above, the request simply has no session and behaves exactly as before. ### Anthropic Messages endpoint [#anthropic-messages-endpoint] For the [Anthropic Messages endpoint](/features/anthropic-endpoint) (`/v1/messages`), the session key is derived automatically from `metadata.user_id`. Coding agents such as Claude Code send a JSON object there (e.g. `{"session_id":"",…}`); the gateway uses its `session_id` field. An explicit `x-session-id` header still takes precedence. ## Sticky provider routing [#sticky-provider-routing] When a model is served by multiple providers, requests are normally scored independently, so a multi-turn conversation can bounce between providers. That defeats provider-side **prompt caching**, which only pays off when consecutive requests with a shared prefix reach the **same** provider. With a session id set, LLMGateway scores the session's first request with the normal weighted smart-routing algorithm (price, priority, uptime, throughput) and then **pins that provider for the session**, reusing it on every subsequent request to keep the prompt cache warm. The session stays on that provider — skipping the epsilon-greedy exploration — and only moves when its provider drops below the session uptime threshold or leaves the available pool (health filtering or a failed request dropped by retry/fallback), at which point the session is re-scored and re-pinned to the current best provider. See [Routing → Sticky Session Routing](/features/routing) for the full algorithm, fallback behavior, and the `session-sticky` routing-metadata reason. Session stickiness is **on by default**. Enterprise projects can turn it off per project under **Settings → Routing → Session Stickiness**; when disabled, every request is scored independently regardless of session id (the id is still recorded for observability). Sticky routing optimizes for cache locality over per-request price. A session stays on its provider even if a cheaper or faster alternative is momentarily available, since the prompt-cache savings typically outweigh the difference. ## Observing sessions in the activity log [#observing-sessions-in-the-activity-log] Every request is logged with its resolved session id. In the dashboard **Activity** view you can: * See the **Session ID** on each request's metadata, alongside the request and trace IDs. * **Filter by session id** using the search field next to the custom-metadata search, to pull up every request that belongs to a conversation in one place. This makes it easy to follow a full conversation end-to-end — inspecting how each turn was routed, what it cost, and which provider served it. The session id is distinct from freeform [metadata](/features/metadata). Use metadata custom headers for arbitrary tags (user, tenant, app version); use the session id for the one value that should keep a conversation pinned and traceable. # Source Attribution URL: https://docs.llmgateway.io/features/source # Source Attribution [#source-attribution] The `X-Source` header allows you to identify your domain when making requests to LLM Gateway. This information is used to generate public usage statistics showing how LLM Gateway is being used across different websites and applications. ## X-Source Header [#x-source-header] Include the `X-Source` header with your domain name in your requests: ```bash curl -X POST https://api.llmgateway.io/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "X-Source: example.com" \ -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": "Hello, how are you?" } ] }' ``` ## Domain Format [#domain-format] The `X-Source` header accepts domain names in various formats. All of the following are valid and will be normalized to the same domain: * `example.com` * `https://example.com` * `https://www.example.com` * `www.example.com` All variations will be stripped down to the base domain (`example.com`) for aggregation purposes. ## Public Statistics [#public-statistics] Data from the `X-Source` header is used to generate public statistics about LLM Gateway usage, including: * **Popular Domains**: Which websites and applications are using LLM Gateway most frequently * **Model Usage**: What models are being used by different domains * **Geographic Distribution**: Where requests are coming from across different sources * **Growth Trends**: How usage is growing over time for different domains These statistics help demonstrate the adoption and impact of LLM Gateway across the ecosystem. ## Privacy Considerations [#privacy-considerations] ### What's Public [#whats-public] * Domain names (stripped of protocol and www prefixes) * Aggregated request counts and model usage * General geographic regions (country-level data) ### What's Private [#whats-private] * Individual request content or responses * User identifiers or personal information * Detailed usage patterns beyond aggregated counts * API keys or authentication details ## Benefits [#benefits] Including the `X-Source` header provides several benefits: ### For Your Project [#for-your-project] * **Recognition**: Your domain will appear in public usage statistics * **Credibility**: Demonstrates real-world usage of your application * **Community**: Contributes to the broader LLM Gateway ecosystem ### For the Community [#for-the-community] * **Transparency**: Shows real adoption and usage patterns * **Inspiration**: Other developers can see successful implementations * **Growth**: Helps demonstrate the value of open-source LLM infrastructure ## Optional but Recommended [#optional-but-recommended] While the `X-Source` header is optional, we strongly encourage its use to: * Support transparency in the LLM Gateway ecosystem * Help showcase successful integrations * Contribute to understanding of LLM usage patterns * Demonstrate the real-world impact of your application Your participation helps build a more transparent and collaborative LLM ecosystem. # Speech Generation URL: https://docs.llmgateway.io/features/speech-generation # Speech Generation [#speech-generation] LLMGateway supports text-to-speech (TTS) through the OpenAI-compatible **`/v1/audio/speech`** endpoint, powered by ElevenLabs, Google Gemini, and OpenAI speech models. Want to hear the voices before writing code? The [Audio Studio](https://chat.llmgateway.io/audio) in the Playground generates speech from up to three models side by side, with per-model voice, format, and speed controls. ## Available Models [#available-models] Browse all speech generation models, with up-to-date pricing, on the [models page](https://llmgateway.io/models?filters=1\&audioGeneration=true). Billing varies by model family. Some models are billed on token usage reported by the provider (input text tokens and output audio tokens), while others are billed on input character count (those return audio bytes without usage data). See the [models page](https://llmgateway.io/models?filters=1\&audioGeneration=true) for each model's exact pricing. ## Parameters [#parameters] | Parameter | Type | Default | Description | | ----------------- | ------ | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `model` | string | required | The speech model to use | | `input` | string | required | The text to synthesize into speech | | `voice` | string | model | A prebuilt voice. Defaults to `Kore` (Gemini), `alloy` (OpenAI), or `Sarah` (ElevenLabs) | | `response_format` | string | model | Audio format. OpenAI: `mp3` (default), `opus`, `aac`, `flac`, `wav`, `pcm`. ElevenLabs: `mp3` (default), `wav`, `pcm`, `opus`. Gemini: `wav` (default), `pcm` | | `instructions` | string | — | Optional style/delivery directive prepended to the input (e.g. `"Say cheerfully"`) | | `speed` | number | — | Accepted for OpenAI compatibility, but not applied by Gemini speech models | Gemini speech models return raw PCM audio. LLMGateway wraps it in a WAV container by default (`response_format: "wav"`), or returns the raw 16-bit little-endian PCM at 24 kHz when `response_format: "pcm"` is requested. Other formats such as `mp3` are only available on the OpenAI models, which return the audio already encoded in the requested format. ## curl [#curl] ```bash curl -X POST "https://api.llmgateway.io/v1/audio/speech" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-2.5-flash-preview-tts", "input": "Hello, welcome to LLM Gateway!", "voice": "Kore" }' \ --output speech.wav ``` ## OpenAI SDK [#openai-sdk] Works with the standard OpenAI client library — just point the base URL to LLMGateway. ```ts import OpenAI from "openai"; import { writeFileSync } from "fs"; const openai = new OpenAI({ apiKey: process.env.LLM_GATEWAY_API_KEY, baseURL: "https://api.llmgateway.io/v1", }); const response = await openai.audio.speech.create({ model: "gemini-2.5-flash-preview-tts", voice: "Kore", input: "Hello, welcome to LLM Gateway!", }); const buffer = Buffer.from(await response.arrayBuffer()); writeFileSync("speech.wav", buffer); ``` ## Streaming [#streaming] Streaming speech responses (chunked audio or `stream_format: "sse"`) are not supported yet. The endpoint always returns the complete audio file in a single response, so there is no low-latency, play-as-you-go output for now. ## Voices [#voices] Gemini exposes 30 prebuilt voices. A few common ones: `Kore`, `Puck`, `Zephyr`, `Charon`, `Fenrir`, `Leda`, `Orus`, `Aoede`. When `voice` is omitted on a Gemini model, `Kore` is used. OpenAI voices include `alloy`, `ash`, `ballad`, `coral`, `echo`, `fable`, `nova`, `onyx`, `sage`, `shimmer`, and `verse`. When `voice` is omitted on an OpenAI model, `alloy` is used. ElevenLabs models accept 20 named voices, including `Sarah`, `Aria`, `Roger`, `Laura`, `Charlie`, `George`, `Charlotte`, `Jessica`, `Brian`, and `Lily`. When `voice` is omitted on an ElevenLabs model, `Sarah` is used. A raw ElevenLabs voice id is also accepted directly. ## ElevenLabs [#elevenlabs] The four ElevenLabs models are billed per **input character** (see the [models page](https://llmgateway.io/models?filters=1\&audioGeneration=true) for rates): * `eleven-multilingual-v2` — most lifelike, rich emotional expression, 29 languages * `eleven-v3` — most expressive and human-like, 70+ languages * `eleven-flash-v2-5` — ultra-low latency, 32 languages * `eleven-turbo-v2-5` — fast and balanced, 32 languages ```bash curl -X POST "https://api.llmgateway.io/v1/audio/speech" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "eleven-multilingual-v2", "input": "Hello, welcome to LLM Gateway!", "voice": "Sarah" }' \ --output speech.mp3 ``` # Video Generation URL: https://docs.llmgateway.io/features/video-generation # Video Generation [#video-generation] LLMGateway supports asynchronous video generation through an OpenAI-compatible `POST /v1/videos` flow. Currently available models: * **Veo 3.1** through `avalanche` (1080p, 4k) and `google-vertex` (720p, 1080p, 4k) * **Seedance 2.0**, **Seedance 2.0 Fast**, and **Seedance 1.5 Pro** through `bytedance` (720p, 1080p) You can find the current list of video-capable models on our [models page with the video filter enabled](https://llmgateway.io/models?filters=1\&videoGeneration=true) or programmatically through the [/v1/models endpoint](/v1_models). ## What Works Today [#what-works-today] * `POST /v1/videos` * `GET /v1/videos/{video_id}` * `GET /v1/videos/{video_id}/content` * Optional signed callbacks with `callback_url` and `callback_secret` ## Request Format [#request-format] LLMGateway currently supports a focused subset of the OpenAI video API. ### Supported fields [#supported-fields] | Field | Type | Required | Description | | ------------------ | ------- | -------- | -------------------------------------------------------------------------------------------------------------------------- | | `model` | string | yes | Any video-capable model from the filtered models page | | `prompt` | string | yes | Text prompt for the video | | `seconds` | number | yes | Duration in seconds. Supported values depend on the model (see below) | | `size` | string | no | `widthxheight`, limited to the sizes supported by the selected model and provider | | `audio` | boolean | no | Whether to include audio in the output (default `true`). Only honored when the model supports both audio and silent output | | `image` | object | no | Optional first frame for image-to-video generation | | `last_frame` | object | no | Optional ending frame when `image` is provided | | `reference_images` | array | no | One to three provider-specific image inputs | | `input_reference` | object | no | Alias for one or more `reference_images` | | `reference_videos` | array | no | One to three reference video HTTPS URLs (Seedance 2.0 only, see below) | | `reference_audios` | array | no | One to three reference audio HTTPS URLs (Seedance 2.0 only, see below) | | `callback_url` | string | no | LLMGateway extension for completion webhooks | | `callback_secret` | string | no | LLMGateway extension used to sign webhook deliveries | ### Sizes and durations by model [#sizes-and-durations-by-model] | Model family | Provider | Supported sizes | Supported durations | | --------------------------------- | --------------- | -------------------------------------------------------------------------- | ------------------- | | Veo 3.1 | `google-vertex` | `1280x720`, `720x1280`, `1920x1080`, `1080x1920`, `3840x2160`, `2160x3840` | `4`, `6`, `8`, `10` | | Veo 3.1 | `avalanche` | `1920x1080`, `1080x1920`, `3840x2160`, `2160x3840` | `8` | | Seedance 2.0 / 2.0 Fast / 1.5 Pro | `bytedance` | `1280x720`, `720x1280`, `1920x1080`, `1080x1920` | `5`, `10` | Requests return `400` when the selected provider cannot serve the requested `size` or `seconds`. Seedance derives `aspect_ratio` from the requested `size` (16:9 for landscape, 9:16 for portrait). ### Reference-guided generation (Seedance 2.0) [#reference-guided-generation-seedance-20] Seedance 2.0 (`seedance-2-0`, `seedance-2-0-fast`) can generate a video that is guided by reference **images**, **videos**, and **audio** — sometimes called omni-reference. You attach references as top-level fields in the same `POST /v1/videos` payload; the gateway forwards each one to the provider tagged with the correct role, so you don't set roles yourself. | Reference type | Payload field | Count | Accepted input | Available on | | -------------- | -------------------------------------------- | ----- | -------------------------------- | ---------------------------------------------------- | | Image | `reference_images` (`input_reference` alias) | 1–3 | HTTPS URL **or** base64 data URL | Seedance 2.0, Veo 3.1 (`google-vertex`, `avalanche`) | | Video | `reference_videos` | 1–3 | HTTPS URL only | Seedance 2.0 | | Audio | `reference_audios` | 1–3 | HTTPS URL only | Seedance 2.0 | Each list item accepts either a bare URL string or an object form: * `reference_images`: `"https://…/subject.png"` or `{ "image_url": "https://…/subject.png" }` * `reference_videos`: `"https://…/motion.mp4"` or `{ "video_url": "https://…/motion.mp4" }` * `reference_audios`: `"https://…/track.mp3"` or `{ "audio_url": "https://…/track.mp3" }` You can mix all three reference types in one request. The `prompt` can be a light instruction (for example `"adapt this to show more detail"`) — the references drive the result. #### Rules and limits [#rules-and-limits] * **HTTPS only for video and audio.** `reference_videos` and `reference_audios` must be publicly reachable HTTPS URLs (the provider fetches them). base64 data URLs are rejected for video/audio; images may be HTTPS URLs or base64 data URLs. * **Reference video resolution.** Seedance requires reference video frames to be at least \~409,600 pixels (roughly 480p or larger). Low-resolution clips such as 360p are rejected with a `400`. * **Not combinable with frames.** Reference inputs (`reference_images`, `reference_videos`, `reference_audios`) cannot be combined with the first/last frame inputs (`image`, `last_frame`). * **Provider scope.** Reference videos and audio are only supported on Seedance 2.0 models; sending them to other models returns a `400`. * **Moderation still applies.** The output is subject to the provider's content moderation. Blocked generations finish as `failed` and are logged with a `content_filter` finish reason. #### Examples [#examples] Reference images only (subjects / style): ```bash curl -X POST "https://api.llmgateway.io/v1/videos" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "seedance-2-0", "prompt": "The subject walks through a neon-lit market at night", "seconds": 5, "size": "1280x720", "reference_images": [ { "image_url": "https://example.com/subject.png" }, { "image_url": "https://example.com/style.png" } ] }' ``` Reference video only (motion / scene — let the clip drive the output): ```bash curl -X POST "https://api.llmgateway.io/v1/videos" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "seedance-2-0", "prompt": "adapt this to show more detail", "seconds": 5, "size": "1280x720", "reference_videos": ["https://example.com/reference-motion.mp4"] }' ``` All three reference types combined: ```bash curl -X POST "https://api.llmgateway.io/v1/videos" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "seedance-2-0", "prompt": "The subject performs the choreography from the reference video", "seconds": 5, "size": "1280x720", "reference_images": [ { "image_url": "https://example.com/subject.png" } ], "reference_videos": [ "https://example.com/reference-motion.mp4" ], "reference_audios": [ "https://example.com/reference-track.mp3" ] }' ``` ### Not supported yet [#not-supported-yet] * multipart uploads * `n` values other than `1` * remix/list/delete video endpoints ## Create a Video [#create-a-video] Video generation requires at least `$1.00` in available organization credits before the job is submitted upstream. Pricing is per second of generated video. For Seedance, enabling audio can increase the per-second rate on models that price audio and video separately. Veo 3.1: | Model | Provider | Supported sizes | Price | | ------------------------------- | --------------- | ------------------------------------------------ | ---------------- | | `veo-3.1-generate-preview` | `google-vertex` | `1280x720`, `720x1280`, `1920x1080`, `1080x1920` | `$0.40 / second` | | `veo-3.1-fast-generate-preview` | `google-vertex` | `1280x720`, `720x1280`, `1920x1080`, `1080x1920` | `$0.15 / second` | | `veo-3.1-generate-preview` | `google-vertex` | `3840x2160`, `2160x3840` | `$0.60 / second` | | `veo-3.1-fast-generate-preview` | `google-vertex` | `3840x2160`, `2160x3840` | `$0.35 / second` | | `veo-3.1-generate-preview` | `avalanche` | `1920x1080`, `1080x1920` | `$0.40 / second` | | `veo-3.1-fast-generate-preview` | `avalanche` | `1920x1080`, `1080x1920` | `$0.15 / second` | | `veo-3.1-generate-preview` | `avalanche` | `3840x2160`, `2160x3840` | `$0.60 / second` | | `veo-3.1-fast-generate-preview` | `avalanche` | `3840x2160`, `2160x3840` | `$0.35 / second` | Seedance (ByteDance): | Model | Provider | Resolution | With audio | Video only | | ------------------- | ----------- | ---------- | ------------------- | ------------------- | | `seedance-2-0` | `bytedance` | 720p | `$0.1512 / second` | `$0.1512 / second` | | `seedance-2-0` | `bytedance` | 1080p | `$0.3402 / second` | `$0.3402 / second` | | `seedance-2-0-fast` | `bytedance` | 720p | `$0.121 / second` | `$0.121 / second` | | `seedance-2-0-fast` | `bytedance` | 1080p | `$0.2722 / second` | `$0.2722 / second` | | `seedance-1-5-pro` | `bytedance` | 720p | `$0.05184 / second` | `$0.02592 / second` | | `seedance-1-5-pro` | `bytedance` | 1080p | `$0.1166 / second` | `$0.05832 / second` | ```bash curl -X POST "https://api.llmgateway.io/v1/videos" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "veo-3.1-generate-preview", "prompt": "A cinematic aerial shot flying above a rainforest waterfall at sunrise", "seconds": 8, "size": "1920x1080" }' ``` Example response: ```json { "id": "v_123", "object": "video", "model": "veo-3.1-generate-preview", "status": "queued", "progress": 0, "created_at": 1773600000, "completed_at": null, "expires_at": null, "error": null } ``` ## Retrieve Job Status [#retrieve-job-status] ```bash curl "https://api.llmgateway.io/v1/videos/v_123" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" ``` Typical statuses: * `queued` * `in_progress` * `completed` * `failed` * `canceled` * `expired` `avalanche` requests for `1080p` and `4k` stay `in_progress` until the upgraded output is ready. The gateway keeps polling the upstream upgrade endpoints and only marks the job `completed` once the requested resolution is available. `google-vertex` follows Vertex AI's long-running operation flow. The gateway submits Veo generation with `predictLongRunning`, polls with `fetchPredictOperation`, and streams the final bytes through the gateway content endpoint once the operation is done. `bytedance` uses the ModelArk `/contents/generations/tasks` endpoint. The gateway submits the job, polls the upstream task status, and exposes the final video bytes through the gateway content endpoint once the task succeeds. ## Download the Video [#download-the-video] Once the job is complete, stream the resulting video bytes from the content endpoint: ```bash curl "https://api.llmgateway.io/v1/videos/v_123/content" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ --output video.mp4 ``` ## Signed Callbacks [#signed-callbacks] LLMGateway can notify your application when the job reaches a terminal state. ```bash curl -X POST "https://api.llmgateway.io/v1/videos" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "veo-3.1-fast-generate-preview", "prompt": "A slow-motion close-up of waves crashing against black volcanic rock", "seconds": 8, "callback_url": "https://example.com/webhooks/video", "callback_secret": "whsec_your_secret_here" }' ``` ### Delivery behavior [#delivery-behavior] * Callbacks are sent only for terminal states in v1 * Event types are `video.completed` and `video.failed` * Deliveries retry with exponential backoff on network errors, timeouts, and non-2xx responses * Each attempt is recorded internally in the webhook delivery log table ### Headers [#headers] * `webhook-id` * `webhook-timestamp` * `webhook-signature` ### Signature format [#signature-format] LLMGateway signs the string: ```text {webhook-id}.{webhook-timestamp}.{raw-request-body} ``` using HMAC-SHA256 with your `callback_secret`, then sends: ```text webhook-signature: v1,{base64_signature} ``` ### Verification example [#verification-example] ```ts import { createHmac, timingSafeEqual } from "node:crypto"; function verifyWebhook( body: string, webhookId: string, webhookTimestamp: string, webhookSignature: string, secret: string, ) { const expected = createHmac("sha256", secret) .update(`${webhookId}.${webhookTimestamp}.${body}`) .digest("base64"); const provided = webhookSignature.replace(/^v1,/, ""); return timingSafeEqual(Buffer.from(expected), Buffer.from(provided)); } ``` ## Related Docs [#related-docs] * [Image Generation](/features/image-generation) * [Routing](/features/routing) * [Models API](/v1_models) # Vision Support URL: https://docs.llmgateway.io/features/vision # Vision Support [#vision-support] LLMGateway supports vision-enabled models that can analyze and describe images. You can provide images via HTTPS URLs or inline base64-encoded data. ## Vision-Enabled Models [#vision-enabled-models] You can find all vision-enabled models on our [models page with vision filter](https://llmgateway.io/models?filters=1\&vision=true). These models can process both text and image content in the same request. ## Image Formats [#image-formats] ### Using HTTPS URLs [#using-https-urls] You can provide any publicly accessible HTTPS URL pointing to an image: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What do you see in this image?" }, { "type": "image_url", "image_url": { "url": "https://example.com/image.jpg" } } ] } ] }' ``` ### Using Base64 Inline Data [#using-base64-inline-data] You can also provide images as base64-encoded data URIs: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image" }, { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEASABIAAD..." } } ] } ] }' ``` ## Content Array Format [#content-array-format] When using vision models, the `content` field should be an array containing both text and image content blocks: * **Text content**: `{"type": "text", "text": "Your message"}` * **Image content**: `{"type": "image_url", "image_url": {"url": "image_url_or_data_uri"}}` ## Multiple Images [#multiple-images] You can include multiple images in a single request: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Compare these two images" }, { "type": "image_url", "image_url": { "url": "https://example.com/image1.jpg" } }, { "type": "image_url", "image_url": { "url": "https://example.com/image2.jpg" } } ] } ] }' ``` ## Simple String Content [#simple-string-content] For vision models, you can still use simple string content for text-only messages. The array format is only required when including images. ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": "Hello! How can you help me today?" } ] }' ``` ## Supported Image Types [#supported-image-types] Vision models typically support common image formats including: * JPEG (.jpg, .jpeg) * PNG (.png) * WebP (.webp) * GIF (.gif) The specific formats supported may vary by model provider. Check the individual model documentation for format limitations and file size restrictions. ## Error Handling [#error-handling] If an image URL is inaccessible or the image format is unsupported, the gateway will handle the error gracefully and may substitute a placeholder or error message in the request to the underlying model. # Native Web Search URL: https://docs.llmgateway.io/features/web-search # Native Web Search [#native-web-search] LLM Gateway supports native web search capabilities that allow models to access real-time information from the internet. This feature is useful for answering questions about current events, recent news, live data, and other time-sensitive information that may not be in the model's training data. ## How It Works [#how-it-works] When you include the `web_search` tool in your request, the model can search the web to gather relevant information before generating a response: 1. You send a request with the `web_search` tool enabled 2. The model determines if web search is needed based on the query 3. If needed, the model performs web searches to gather current information 4. The model synthesizes the search results and generates a response 5. Citations are included in the response to show information sources ## Supported Providers [#supported-providers] Native web search is available on select models. See all models with native web search support on our [models page](https://llmgateway.io/models?filters=1\&webSearch=true). ## Basic Usage [#basic-usage] To enable web search, add the `web_search` tool to your request: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-5.2", "messages": [ { "role": "user", "content": "What is the current weather in San Francisco?" } ], "tools": [ { "type": "web_search" } ] }' ``` ### Example Response [#example-response] ```json { "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1234567890, "model": "openai/gpt-5.2", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The current weather in San Francisco is 57°F (14°C) with mostly cloudy skies...", "annotations": [ { "type": "url_citation", "url": "https://weather.com/...", "title": "San Francisco Weather" } ] }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 15, "completion_tokens": 150, "total_tokens": 165, "cost": 0.0315 } } ``` ## Web Search Options [#web-search-options] The `web_search` tool accepts optional configuration parameters: ### User Location [#user-location] Provide location context to get more relevant local search results: ```json { "type": "web_search", "user_location": { "city": "San Francisco", "region": "California", "country": "US", "timezone": "America/Los_Angeles" } } ``` ### Search Context Size [#search-context-size] Control the amount of web content retrieved (OpenAI only): ```json { "type": "web_search", "search_context_size": "medium" } ``` Available values: * `low` - Minimal search context, faster responses * `medium` - Balanced context (default) * `high` - Maximum search context, more comprehensive ### Max Uses [#max-uses] Limit the number of searches per request (provider-dependent): ```json { "type": "web_search", "max_uses": 3 } ``` ## Using with SDKs [#using-with-sdks] ### OpenAI SDK (Python) [#openai-sdk-python] ```python from openai import OpenAI client = OpenAI( base_url="https://api.llmgateway.io/v1", api_key="your-api-key" ) response = client.chat.completions.create( model="gpt-5.2", messages=[ {"role": "user", "content": "What are the latest news headlines today?"} ], tools=[{"type": "web_search"}] ) print(response.choices[0].message.content) ``` ### OpenAI SDK (TypeScript) [#openai-sdk-typescript] ```typescript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.llmgateway.io/v1", apiKey: "your-api-key", }); const response = await client.chat.completions.create({ model: "gpt-5.2", messages: [{ role: "user", content: "What are the latest tech news?" }], tools: [{ type: "web_search" }], }); console.log(response.choices[0].message.content); ``` ## Streaming [#streaming] Web search works with streaming responses. Citations are included in the final chunks: ```bash curl -X POST "https://api.llmgateway.io/v1/chat/completions" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-5.2", "messages": [ {"role": "user", "content": "What is the current stock price of Apple?"} ], "tools": [{"type": "web_search"}], "stream": true }' ``` ## Citations and Sources [#citations-and-sources] Web search responses include citations to show where information was sourced from. These appear in the `annotations` field of the message: ```json { "annotations": [ { "type": "url_citation", "url": "https://example.com/article", "title": "Article Title", "start_index": 0, "end_index": 50 } ] } ``` Citation format may vary slightly between providers, but LLM Gateway normalizes them into a consistent structure. ## Cost Tracking [#cost-tracking] Web search costs are rolled into the total `cost` reported in the usage object: ```json { "usage": { "prompt_tokens": 15, "completion_tokens": 150, "total_tokens": 165, "cost": 0.0125, "cost_details": { "upstream_inference_cost": 0.0115, "upstream_inference_prompt_cost": 0.0015, "upstream_inference_completions_cost": 0.01, "total_cost": 0.0125, "input_cost": 0.0015, "output_cost": 0.01, "web_search_cost": 0.001 } } } ``` Web search is billed at $0.01 per search call for reasoning models (GPT-5, o-series) and $0.025 per call for non-reasoning models. The web search charge is included in the top-level `cost` value and surfaced separately as `cost_details.web_search_cost`. ## Combining with Function Tools [#combining-with-function-tools] You can use web search alongside regular function tools: ```json { "tools": [ { "type": "web_search" }, { "type": "function", "function": { "name": "get_weather", "description": "Get weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string" } } } } } ] } ``` Some dedicated search models only support web search and do not support additional function tools. Use `gpt-5.2` or other GPT-5 series models if you need both web search and function tools. ## Use Cases [#use-cases] ### Current Events and News [#current-events-and-news] ```json { "messages": [ { "role": "user", "content": "What are the major news stories today?" } ], "tools": [{ "type": "web_search" }] } ``` ### Real-Time Data [#real-time-data] ```json { "messages": [ { "role": "user", "content": "What is the current price of Bitcoin?" } ], "tools": [{ "type": "web_search" }] } ``` ### Research and Fact-Checking [#research-and-fact-checking] ```json { "messages": [ { "role": "user", "content": "What are the latest findings on climate change?" } ], "tools": [{ "type": "web_search" }] } ``` ### Local Information [#local-information] ```json { "messages": [ { "role": "user", "content": "What restaurants are open near me right now?" } ], "tools": [ { "type": "web_search", "user_location": { "city": "New York", "country": "US" } } ] } ``` ## Best Practices [#best-practices] 1. **Use GPT-5.2**: For the best web search experience with full tool support, use `gpt-5.2` 2. **Provide location context**: When queries are location-dependent, include `user_location` for more relevant results 3. **Monitor costs**: Web search incurs per-query costs in addition to token costs 4. **Check citations**: Always review the citations in responses to verify information sources 5. **Use streaming**: For user-facing applications, enable streaming to show responses as they're generated ## Error Handling [#error-handling] If you try to use web search with a model that doesn't support it: ```json { "error": { "message": "Model gpt-4o does not support native web search. Remove the web_search tool or use a model that supports it. See https://llmgateway.io/models?features=webSearch for supported models.", "type": "invalid_request_error" } } ``` To avoid this error, only use the `web_search` tool with [native web search enabled models](https://llmgateway.io/models?filters=1\&webSearch=true). # Agent Skills URL: https://docs.llmgateway.io/guides/agent-skills **Agent Skills** are structured guidelines for AI coding agents, optimized for use with LLM Gateway and the AI SDK. They provide best practices and reusable instructions that help AI agents generate higher-quality code. ## What Are Agent Skills? [#what-are-agent-skills] Agent Skills are packaged sets of rules and guidelines that teach AI coding agents how to implement specific features correctly. Each skill covers: * API integration patterns * Frontend rendering best practices * Error handling strategies * Performance optimization techniques ## Available Skills [#available-skills] ### Image Generation [#image-generation] The Image Generation skill teaches AI agents how to properly implement image generation features: * **API Integration** — correctly calling image generation APIs * **Frontend Rendering** — displaying generated images efficiently * **Error Handling** — graceful degradation and retry logic * **Performance** — caching, lazy loading, and optimization ## Installation [#installation] ### Prerequisites [#prerequisites] Ensure you have Node.js 18+ and pnpm 9+ installed: ```bash node --version # v18.0.0 or higher pnpm --version # 9.0.0 or higher ``` ### Clone the Repository [#clone-the-repository] ```bash git clone https://github.com/theopenco/agent-skills.git cd agent-skills ``` ### Install Dependencies [#install-dependencies] ```bash pnpm install ``` ### Build Skills [#build-skills] Build all skills to generate the documentation: ```bash pnpm build:all ``` Or build a specific skill: ```bash pnpm build ``` ## Using Skills in Your Project [#using-skills-in-your-project] After building, each skill generates an `AGENTS.md` file that can be used with AI coding agents like Claude, Cursor, or Copilot. ### With Claude Code [#with-claude-code] Add the generated `AGENTS.md` content to your project's `CLAUDE.md` file: ```bash cat skills/image-generation/AGENTS.md >> CLAUDE.md ``` ### With Cursor [#with-cursor] Add the skill content to your `.cursorrules` file: ```bash cat skills/image-generation/AGENTS.md >> .cursorrules ``` ### With Other AI Agents [#with-other-ai-agents] Most AI coding tools support custom instructions. Copy the skill content into your tool's configuration. ## Project Structure [#project-structure] ``` agent-skills/ ├── packages/ │ └── skills-build/ # Build tooling ├── skills/ │ └── image-generation/ # Individual skill │ ├── rules/ # Rule files │ ├── AGENTS.md # Generated documentation │ └── metadata.json # Skill metadata └── package.json ``` ## Contributing [#contributing] ### Adding New Rules [#adding-new-rules] ### Fork and Clone [#fork-and-clone] Fork the repository and create a feature branch: ```bash git checkout -b feat/new-rule ``` ### Create a Rule File [#create-a-rule-file] Rules follow a standardized template with YAML frontmatter containing `title`, `impact` (high/medium/low), and `tags`. The body includes sections for Context, Incorrect examples, and Correct examples with TypeScript code blocks. See existing rules in `skills/image-generation/rules/` for reference. ### Validate and Build [#validate-and-build] ```bash pnpm validate pnpm build:all ``` ### Submit a Pull Request [#submit-a-pull-request] Push your changes and open a PR. ### Impact Levels [#impact-levels] When creating rules, use these impact levels: * **high** — Critical for correctness or security * **medium** — Important for quality and maintainability * **low** — Nice-to-have improvements ## Development Commands [#development-commands] | Command | Description | | ---------------- | --------------------------- | | `pnpm install` | Install dependencies | | `pnpm build:all` | Build all skills | | `pnpm build` | Build a specific skill | | `pnpm validate` | Validate rule files | | `pnpm dev` | Development mode with watch | ## More Resources [#more-resources] * [GitHub Repository](https://github.com/theopenco/agent-skills) — Source code and contributions * [LLM Gateway CLI](/guides/cli) — Project scaffolding tool * [Templates](https://llmgateway.io/templates) — Production-ready starter projects Want to contribute a new skill or rule? Check out the [contribution guidelines](https://github.com/theopenco/agent-skills#contributing) on GitHub. # Autohand Code Integration URL: https://docs.llmgateway.io/guides/autohand Autohand Code is an autonomous AI coding agent that works in your terminal, IDE, and Slack. With LLM Gateway, you can route all Autohand Code requests through a single gateway—use any of 180+ models from 60+ providers, with full cost tracking and smart routing. ## Setup [#setup] ### Sign Up for LLM Gateway [#sign-up-for-llm-gateway] [Sign up free](https://llmgateway.io/signup) — no credit card required. Copy your API key from the dashboard. ### Set Environment Variables [#set-environment-variables] Configure Autohand Code to use LLM Gateway: ```bash export OPENAI_BASE_URL=https://api.llmgateway.io/v1 export OPENAI_API_KEY=llmgtwy_your_api_key_here ``` ### Run Autohand Code [#run-autohand-code] ```bash autohand ``` All requests will now be routed through LLM Gateway. ## Why Use LLM Gateway with Autohand Code [#why-use-llm-gateway-with-autohand-code] * **180+ models** — GPT-5, Claude Opus, Gemini, Llama, and more from 60+ providers * **Smart routing** — Automatically selects the best provider based on uptime, throughput, price, and latency * **Cost tracking** — Monitor exactly how much each autonomous agent costs * **Single bill** — No need to manage multiple API provider accounts * **Response caching** — Repeated requests hit cache automatically * **Automatic failover** — If one provider is down, requests route to another ## Configuration File [#configuration-file] You can also configure LLM Gateway in Autohand Code's config file: ```json { "provider": { "llmgateway": { "baseUrl": "https://api.llmgateway.io/v1", "apiKey": "llmgtwy_your_api_key_here" } }, "model": "gpt-5" } ``` ## Choosing Models [#choosing-models] You can use any model from the [models page](https://llmgateway.io/models). | Model | Best For | | ------------------- | ------------------------------------------- | | `gpt-5` | Latest OpenAI flagship, highest quality | | `claude-opus-4-6` | Anthropic's most capable model | | `claude-sonnet-4-6` | Fast reasoning with extended thinking | | `gemini-2.5-pro` | Google's latest flagship, 1M context window | | `o3` | Advanced reasoning tasks | | `gpt-5-mini` | Cost-effective, quick responses | | `gemini-2.5-flash` | Fast responses, good for high-volume | | `deepseek-v3.1` | Open-source with vision and tools | ## Autohand Code Features with LLM Gateway [#autohand-code-features-with-llm-gateway] ### Terminal (CLI) [#terminal-cli] Autohand Code CLI works seamlessly with LLM Gateway. Set the environment variables and use all Autohand Code commands as normal—multi-file editing, agentic search, and autonomous code generation all work out of the box. ### IDE Integration [#ide-integration] Autohand Code's VS Code and Zed extensions respect the same environment variables. Set them in your shell profile and the IDE integration will automatically route through LLM Gateway. ### Slack Integration [#slack-integration] When using Autohand Code through Slack, configure the LLM Gateway base URL in your Autohand Code server settings to route all Slack-triggered coding tasks through the gateway. ## Monitoring Usage [#monitoring-usage] Once configured, all Autohand Code requests appear in your LLM Gateway dashboard: * **Request logs** — See every prompt and response * **Cost breakdown** — Track spending by model and time period * **Usage analytics** — Understand your AI usage patterns View all available models on the [models page](https://llmgateway.io/models). Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # Claude Code Integration URL: https://docs.llmgateway.io/guides/claude-code Claude Code is locked to Anthropic's API by default. With LLM Gateway, you can point it at any model—GPT-5, Gemini, Llama, or 180+ others—while keeping the same Anthropic API format Claude Code expects. Three environment variables. No code changes. Full cost tracking in your dashboard. ## Setup [#setup] ### Sign Up for LLM Gateway [#sign-up-for-llm-gateway] [Sign up free](https://llmgateway.io/signup) — no credit card required. Copy your API key from the dashboard. ### Set Environment Variables [#set-environment-variables] Configure Claude Code to use LLM Gateway: ```bash export ANTHROPIC_BASE_URL=https://api.llmgateway.io export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here # optional: specify a model, otherwise it uses the default Claude model export ANTHROPIC_MODEL=gpt-5 # or any model from our catalog ``` ### Run Claude Code [#run-claude-code] ```bash claude ``` All requests will now be routed through LLM Gateway. ## Why This Works [#why-this-works] LLM Gateway's `/v1/messages` endpoint speaks Anthropic's API format natively. We handle the translation to each provider behind the scenes. This means: * **Use any model** — GPT-5, Gemini, Llama, or Claude itself * **Keep your workflow** — Claude Code doesn't know the difference * **Track costs** — Every request appears in your LLM Gateway dashboard * **Automatic caching** — Repeated requests hit cache, saving money ## Choosing Models [#choosing-models] You can use any model from the [models page](https://llmgateway.io/models). ### Use OpenAI's Latest Models [#use-openais-latest-models] ```bash # Use the latest GPT model export ANTHROPIC_MODEL=gpt-5 # Use a cost-effective alternative export ANTHROPIC_MODEL=gpt-5-mini ``` ### Use Google's Gemini [#use-googles-gemini] ```bash export ANTHROPIC_MODEL=gemini-2.5-pro ``` ### Use Anthropic's Claude Models [#use-anthropics-claude-models] ```bash export ANTHROPIC_MODEL=anthropic/claude-3-5-sonnet-20241022 ``` ## Environment Variables [#environment-variables] ### ANTHROPIC\_MODEL [#anthropic_model] Specifies the main model to use for primary requests. ```bash export ANTHROPIC_MODEL=gpt-5 ``` ### Complete Configuration Example [#complete-configuration-example] ```bash export ANTHROPIC_BASE_URL=https://api.llmgateway.io export ANTHROPIC_AUTH_TOKEN=llmgtwy_your_api_key_here export ANTHROPIC_MODEL=gpt-5 export ANTHROPIC_SMALL_FAST_MODEL=gpt-5-nano ``` ## Making Manual API Requests [#making-manual-api-requests] If you want to test the endpoint directly, you can make manual requests: ```bash curl -X POST "https://api.llmgateway.io/v1/messages" \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-5", "messages": [ {"role": "user", "content": "Hello, how are you?"} ], "max_tokens": 100 }' ``` ### Response Format [#response-format] The endpoint returns responses in Anthropic's message format: ```json { "id": "msg_abc123", "type": "message", "role": "assistant", "model": "gpt-5", "content": [ { "type": "text", "text": "Hello! I'm doing well, thank you for asking. How can I help you today?" } ], "stop_reason": "end_turn", "stop_sequence": null, "usage": { "input_tokens": 13, "output_tokens": 20 } } ``` ## What You Get [#what-you-get] * **Any model in Claude Code** — GPT-5 for heavy lifting, GPT-4o Mini for routine tasks * **Cost visibility** — See exactly what each coding agent costs * **One bill** — Stop managing separate accounts for OpenAI, Anthropic, Google * **Response caching** — Repeated requests (like linting the same file) hit cache * **Discounts** — Check [discounted models](https://llmgateway.io/models?discounted=true) for savings up to 90% View all available models on the [models page](https://llmgateway.io/models). Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # LLM Gateway CLI URL: https://docs.llmgateway.io/guides/cli The **LLM Gateway CLI** (`@llmgateway/cli`) is a command-line utility for scaffolding projects, managing AI applications, and discovering models. ## Installation [#installation] Run commands directly without installation: ```bash npx @llmgateway/cli init ``` Install globally for faster access: ```bash npm install -g @llmgateway/cli ``` Then run commands directly: ```bash llmgateway init ``` ## Quick Start [#quick-start] ### Initialize a Project [#initialize-a-project] Create a new project from a template: ```bash npx @llmgateway/cli init ``` Or specify the template and name directly: ```bash npx @llmgateway/cli init --template image-generation --name my-ai-app ``` ### Configure Authentication [#configure-authentication] Login to save your API key locally: ```bash npx @llmgateway/cli auth login ``` This opens a browser window to authenticate with LLM Gateway. Your credentials are stored in `~/.llmgateway/config.json`. Alternatively, set the `LLMGATEWAY_API_KEY` environment variable which takes precedence over the config file. ### Start Development [#start-development] Navigate to your project and start the development server: ```bash cd my-ai-app npx @llmgateway/cli dev ``` Or specify a custom port: ```bash npx @llmgateway/cli dev --port 3000 ``` ## Commands [#commands] ### `init` [#init] Initialize a new project from a template. ```bash npx @llmgateway/cli init [options] ``` **Options:** * `--template ` — Template to use (e.g., `image-generation`, `weather-agent`) * `--name ` — Project name **Examples:** ```bash # Interactive mode npx @llmgateway/cli init # With options npx @llmgateway/cli init --template image-generation --name my-app ``` ### `list` [#list] Display available project templates. ```bash npx @llmgateway/cli list ``` **Options:** * `--json` — Output in JSON format ### `models` [#models] Browse and filter available AI models. ```bash npx @llmgateway/cli models [options] ``` **Options:** * `--capability ` — Filter by capability (e.g., `chat`, `image`, `embedding`) * `--provider ` — Filter by provider (e.g., `openai`, `anthropic`, `google`) * `--search ` — Search models by name **Examples:** ```bash # List all models npx @llmgateway/cli models # Filter by provider npx @llmgateway/cli models --provider openai # Search models npx @llmgateway/cli models --search gpt ``` ### `add` [#add] Add tools or API routes to an existing project. ```bash npx @llmgateway/cli add ``` **Tools available:** * `weather` — Weather lookup functionality * `search` — Web search capability * `calculator` — Mathematical operations **API routes available:** * `generate` — Text generation endpoint * `chat` — Chat completion endpoint ### `auth` [#auth] Manage API authentication. ```bash # Login via browser npx @llmgateway/cli auth login # Check authentication status npx @llmgateway/cli auth status # Logout npx @llmgateway/cli auth logout ``` ### `dev` [#dev] Start the local development server. ```bash npx @llmgateway/cli dev [options] ``` **Options:** * `--port ` — Port to run on (default: 3000) ### `upgrade` [#upgrade] Update LLM Gateway dependencies in your project. ```bash npx @llmgateway/cli upgrade [options] ``` **Options:** * `--dry-run` — Show what would be updated without making changes ### `docs` [#docs] Open the documentation in your browser. ```bash npx @llmgateway/cli docs ``` ## Available Templates [#available-templates] ### Image Generation [#image-generation] A full-stack application for AI image generation. * **Stack:** Next.js 16, React 19, TypeScript * **Features:** Multi-provider support (DALL-E, Stable Diffusion), unified API * **Use case:** Image generation apps, creative tools ```bash npx @llmgateway/cli init --template image-generation ``` ### QA Agent [#qa-agent] An AI-powered QA testing agent that uses browser automation to test your web app. * **Stack:** Next.js 16, React 19, TypeScript, Agent Browser * **Features:** Natural language testing, real-time action timeline, live browser preview * **Use case:** Automated QA testing, regression testing, user flow validation ```bash npx @llmgateway/cli init --template qa-agent ``` ### Weather Agent [#weather-agent] A CLI agent demonstrating tool calling capabilities. * **Stack:** TypeScript, AI SDK, OpenAI * **Features:** Tool calling, real-time data, natural language * **Use case:** Learning tool usage, building CLI agents ```bash npx @llmgateway/cli init --template weather-agent ``` ## Configuration [#configuration] The CLI stores configuration in `~/.llmgateway/config.json`: ```json { "apiKey": "llmgtwy_...", "defaultTemplate": "image-generation" } ``` ### Environment Variables [#environment-variables] The `LLMGATEWAY_API_KEY` environment variable takes precedence over the config file: ```bash export LLMGATEWAY_API_KEY="llmgtwy_..." ``` ## More Resources [#more-resources] * [Agents](https://llmgateway.io/agents) — Pre-built AI agents * [Templates](https://llmgateway.io/templates) — Production-ready starter projects * [GitHub Repository](https://github.com/theopenco/llmgateway-templates) — Source code and issues Need help or want to request a feature? Open an issue on [GitHub](https://github.com/theopenco/llmgateway-templates/issues). # Cline Integration URL: https://docs.llmgateway.io/guides/cline [Cline](https://cline.bot) is an autonomous AI coding assistant that lives in your VS Code editor. It can create and edit files, run terminal commands, and help you build complex projects. You can configure Cline to use LLM Gateway for access to multiple AI providers with unified billing and cost tracking. ## Prerequisites [#prerequisites] * VS Code based IDE installed * An LLM Gateway API key ## Setup [#setup] Cline supports OpenAI-compatible API endpoints, making it straightforward to integrate with LLM Gateway. ### Install Cline Extension [#install-cline-extension] 1. Open VS Code 2. Go to the Extensions view (Cmd/Ctrl + Shift + X) 3. Search for "Cline" 4. Click **Install** on the Cline extension Install Cline Extension ### Open Cline Settings [#open-cline-settings] 1. Click on the Cline icon in the VS Code sidebar 2. Click the settings gear icon in the Cline panel Cline Settings ### Configure API Provider [#configure-api-provider] 1. In the API Provider dropdown, select **OpenAI Compatible** 2. Enter the following details: * **Base URL**: `https://api.llmgateway.io/v1` * **API Key**: Your LLM Gateway API key * **Model ID**: Choose a model (e.g., `claude-opus-4-5-20251101`, `gpt-5.2`, `gemini-3-pro-preview`, `deepseek-3.2`). See [provider-specific routing](/features/routing#provider-specific-routing) for more options. Configure API Provider ### Test the Integration [#test-the-integration] 1. Open a project in VS Code 2. Click on the Cline icon in the sidebar 3. Type a message like "Create a hello world function in Python" 4. Cline should respond and offer to create the file Test Cline All requests will now be routed through LLM Gateway. View all available models on the [models page](https://llmgateway.io/models). ## Features [#features] Once configured, you can use all of Cline's features with LLM Gateway: ### Autonomous Coding [#autonomous-coding] * Create new files and projects from scratch * Edit existing code based on natural language instructions * Refactor and improve code quality ### Terminal Commands [#terminal-commands] * Run build commands, tests, and scripts * Install dependencies * Execute any terminal operation ### File Management [#file-management] * Create, read, and modify files * Navigate your codebase * Search for relevant code ## Model Selection Tips [#model-selection-tips] ### Using Provider-Specific Models [#using-provider-specific-models] To use a specific provider's version of a model, prefix the model ID with the provider name. See [provider-specific routing](/features/routing#provider-specific-routing) for more options. ### Using Discounted Models [#using-discounted-models] LLM Gateway offers discounted access to some models. Find them on the [models page](https://llmgateway.io/models?view=grid\&filters=1\&discounted=true) and copy the model ID. ### Using Free Models [#using-free-models] Some models are available for free. Browse them on the [models page](https://llmgateway.io/models?view=grid\&filters=1\&free=true). Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. ## Benefits of Using LLM Gateway with Cline [#benefits-of-using-llm-gateway-with-cline] * **Multi-Provider Access**: Use models from OpenAI, Anthropic, Google, and more through a single API * **Cost Control**: Track and limit your AI spending with detailed usage analytics * **Unified Billing**: One account for all providers instead of managing multiple API keys * **Caching**: Reduce costs with response caching for repeated requests * **Analytics**: Monitor usage patterns and costs in the dashboard # Codex CLI Integration URL: https://docs.llmgateway.io/guides/codex-cli Codex CLI is OpenAI's open-source terminal coding agent. By default it connects to OpenAI's API, but with LLM Gateway you can route it through a single gateway—use GPT-5.3 Codex, Gemini, Claude, or any of 180+ models while keeping full cost visibility. One config file. No code changes. Full cost tracking in your dashboard. ## Setup [#setup] ### Sign Up for LLM Gateway [#sign-up-for-llm-gateway] [Sign up free](https://llmgateway.io/signup) — no credit card required. Copy your API key from the dashboard. ### Log Out of ChatGPT [#log-out-of-chatgpt] If you're logged into ChatGPT in Codex CLI, the stored session will override your custom config. Log out first: ```bash codex logout ``` ### Create Config File [#create-config-file] Create or edit `~/.codex/config.toml`: ```bash model = "auto" model_reasoning_effort = "high" openai_base_url = "https://api.llmgateway.io/v1" ``` ### Run Codex CLI [#run-codex-cli] ```bash codex ``` On first launch, Codex will prompt you for authentication. Select **Provide your own API key**, then enter your LLM Gateway API key (starts with `llmgtwy_`). All requests will now be routed through LLM Gateway. ## Why This Works [#why-this-works] LLM Gateway's `/v1` endpoint is fully OpenAI-compatible. Codex CLI sends requests to our gateway instead of OpenAI directly, and we route them to the right provider behind the scenes. This means: * **Use any model** — GPT-5.3 Codex, Gemini, Claude, or 180+ others * **Keep your workflow** — Codex CLI doesn't know the difference * **Track costs** — Every request appears in your LLM Gateway dashboard * **Automatic caching** — Repeated requests hit cache, saving money ## Configuration Explained [#configuration-explained] ### Base URL [#base-url] The `openai_base_url` field points Codex CLI to LLM Gateway instead of OpenAI: ```bash openai_base_url = "https://api.llmgateway.io/v1" ``` ### Model Selection [#model-selection] Use `auto` to let LLM Gateway pick the best model, or set a specific one from the [models page](https://llmgateway.io/models): ```bash model = "auto" # or pick a specific model model = "gpt-5.3-codex" ``` ### Reasoning Effort [#reasoning-effort] Control how much reasoning the model uses. Options are `low`, `medium`, and `high`: ```bash model_reasoning_effort = "high" ``` ## Choosing Models [#choosing-models] Use `auto` to let LLM Gateway pick the best model automatically, or choose a specific one from the [models page](https://llmgateway.io/models): ```bash # let LLM Gateway pick the best model model = "auto" # or pick a specific model model = "gpt-5.3-codex" ``` ## What You Get [#what-you-get] * **Any model in Codex CLI** — GPT-5.3 Codex for heavy lifting, lighter models for routine tasks * **Cost visibility** — See exactly what each coding agent costs * **One bill** — Stop managing separate accounts for OpenAI, Anthropic, Google * **Response caching** — Repeated requests hit cache automatically * **Discounts** — Check [discounted models](https://llmgateway.io/models?discounted=true) for savings up to 90% ## Troubleshooting [#troubleshooting] ### Data retention required [#data-retention-required] If you see an error like: ``` The Responses API requires data retention to be enabled. ``` Codex CLI uses the OpenAI Responses API (`/v1/responses`), which requires data retention to be enabled. To fix this: 1. Go to your [organization settings](https://llmgateway.io/dashboard) and navigate to **Settings > Policies** 2. Select **Retain All Data** and click **Save Settings** If you prefer not to enable data retention, you can configure Codex CLI to use the Chat Completions API instead by setting the `OPENAI_CHAT_COMPLETIONS_PATH` environment variable, if supported by your Codex CLI version. ### Authentication errors [#authentication-errors] If you see `401 Unauthorized` or requests going to `api.openai.com` instead of LLM Gateway: 1. Make sure you've run `codex logout` to clear any ChatGPT session 2. Verify `openai_base_url` is set in `~/.codex/config.toml` 3. When Codex prompts for authentication, select **Provide your own API key** and enter your LLM Gateway key (starts with `llmgtwy_`) ### Model not found [#model-not-found] Verify the model ID matches exactly what's listed on the [models page](https://llmgateway.io/models). Model IDs are case-sensitive. ### Connection issues [#connection-issues] Check that `openai_base_url` is set to `https://api.llmgateway.io/v1` (note the `/v1` at the end). View all available models on the [models page](https://llmgateway.io/models). Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # Continue CLI Integration URL: https://docs.llmgateway.io/guides/continue [Continue](https://docs.continue.dev) is an open-source AI code assistant available as a CLI tool. By configuring it to use LLM Gateway, you get access to 210+ models from 60+ providers with unified cost tracking. One config file. Any model. Full cost visibility. ## Prerequisites [#prerequisites] * An LLM Gateway API key — [sign up free](https://llmgateway.io/signup) (no credit card required) ## Setup [#setup] ### Install Continue CLI [#install-continue-cli] Install Continue CLI globally: ```bash npm install -g @continuedev/cli ``` Installing Continue CLI ### Get Your API Key [#get-your-api-key] [Sign up](https://llmgateway.io/signup) or log in to your LLM Gateway dashboard. Navigate to **API Keys** and create a new key. Copy it — it starts with `llmgtwy_`. ### Create a Config File [#create-a-config-file] Create the Continue config directory and config file: ```bash mkdir -p ~/.continue ``` Then create `~/.continue/config.yaml` with your LLM Gateway configuration: ```yaml name: llmgateway version: 0.0.1 models: - name: claude-sonnet-4-6 provider: openai model: claude-sonnet-4-6 apiBase: https://api.llmgateway.io/v1 apiKey: llmgtwy_your-api-key-here ``` Editing config.yaml Replace `llmgtwy_your-api-key-here` with your actual API key from the dashboard. ### Add More Models (Optional) [#add-more-models-optional] Add as many models as you want from the [models page](https://llmgateway.io/models): ```yaml name: llmgateway version: 0.0.1 models: - name: claude-sonnet-4-6 provider: openai model: claude-sonnet-4-6 apiBase: https://api.llmgateway.io/v1 apiKey: llmgtwy_your-api-key-here - name: gpt-5.5 provider: openai model: gpt-5.5 apiBase: https://api.llmgateway.io/v1 apiKey: llmgtwy_your-api-key-here - name: gemini-3.1-pro provider: openai model: gemini-3.1-pro apiBase: https://api.llmgateway.io/v1 apiKey: llmgtwy_your-api-key-here ``` All models use `provider: openai` since LLM Gateway exposes an OpenAI-compatible API. ### Start Using Continue [#start-using-continue] Launch Continue CLI with the `--config` flag pointing to your config file: ```bash cn --config ~/.continue/config.yaml ``` Continue CLI running with LLM Gateway All requests now route through LLM Gateway. You'll see usage, costs, and logs in your dashboard. ## Why Use LLM Gateway with Continue [#why-use-llm-gateway-with-continue] * **210+ models** — Claude, GPT, Gemini, Llama, DeepSeek, and more * **One API key** — Stop managing separate keys for each provider * **Cost tracking** — See exactly what each session costs in your dashboard * **Response caching** — Repeated requests hit cache automatically * **Automatic fallback** — If a provider is down, requests route to an alternative * **Volume discounts** — Check [discounted models](https://llmgateway.io/models?discounted=true) for savings up to 90% ## Configuration Details [#configuration-details] ### Provider Setting [#provider-setting] Always use `provider: openai` in your Continue config. LLM Gateway exposes an OpenAI-compatible API, so Continue's OpenAI provider handles all models correctly — including Claude, Gemini, and others. ### Project-Specific Config [#project-specific-config] Place a `.continue/config.yaml` in your project root to override the global config for that project: ```yaml name: project-config version: 0.0.1 models: - name: gpt-5.5 provider: openai model: gpt-5.5 apiBase: https://api.llmgateway.io/v1 apiKey: llmgtwy_your-api-key-here ``` ### Using with the --config Flag [#using-with-the---config-flag] Point to any config file: ```bash cn --config path/to/config.yaml ``` ## Switching Models [#switching-models] Add multiple models to your config and switch between them in the Continue interface. In the CLI, you can specify a model with the `--model` flag if supported, or update your config file. ## Locking to a Specific Provider [#locking-to-a-specific-provider] By default, LLM Gateway automatically fails over to alternative providers if your chosen provider is experiencing downtime. To disable fallback, add a custom header: ```yaml models: - name: claude-sonnet-4-6 provider: openai model: claude-sonnet-4-6 apiBase: https://api.llmgateway.io/v1 apiKey: llmgtwy_your-api-key-here requestOptions: headers: X-No-Fallback: "true" ``` Disabling fallback means requests will fail if the chosen provider is down. See the [routing docs](/docs/features/routing) for details. ## Troubleshooting [#troubleshooting] ### "Failed to parse config" error [#failed-to-parse-config-error] Make sure your config file includes `name` and `version` fields at the top level: ```yaml name: llmgateway version: 0.0.1 models: - ... ``` ### Onboarding wizard still appears [#onboarding-wizard-still-appears] If running `cn` without `--config` shows an onboarding prompt, create the sentinel file to skip it: ```bash touch ~/.continue/.onboarding_complete ``` Or always launch with the `--config` flag to bypass onboarding entirely. ### Model not found [#model-not-found] Verify the model ID matches exactly what's listed on the [models page](https://llmgateway.io/models). Model IDs are case-sensitive. ### Connection timeout [#connection-timeout] Check that `apiBase` is set to `https://api.llmgateway.io/v1` (note the `/v1` at the end). ### Authentication errors [#authentication-errors] Make sure your `apiKey` starts with `llmgtwy_` and is valid. Check your [dashboard](https://llmgateway.io/dashboard) to confirm the key is active. ### Provider must be "openai" [#provider-must-be-openai] LLM Gateway uses an OpenAI-compatible API. Even when using Claude or Gemini models, set `provider: openai` in your Continue config. The gateway handles routing to the correct upstream provider. View all available models on the [models page](https://llmgateway.io/models). Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # Cursor Integration URL: https://docs.llmgateway.io/guides/cursor Cursor is an AI-powered code editor built on VSCode. You can point Cursor's custom OpenAI base URL at LLM Gateway to use any of our 210+ models for **plan mode** (the chat / planning panel). **Plan mode only.** Cursor's coding agent (Composer, inline edit, autocomplete, apply/edit) does **not** work with external OpenAI-compatible endpoints — those features are locked to Cursor's own backend and will not route through LLM Gateway. Only the chat / plan panel honors the custom API key + base URL. If you need a full coding agent backed by LLM Gateway, use [Claude Code](/guides/claude-code), [Codex CLI](/guides/codex-cli), [Cline](/guides/cline), [Continue CLI](/guides/continue), or [Hermes Agent](/guides/hermes-agent) instead. Cursor with LLM Gateway ## Prerequisites [#prerequisites] * An LLM Gateway account with an API key * Cursor IDE installed * Basic understanding of Cursor's AI features ## Setup [#setup] Cursor supports OpenAI-compatible API endpoints, making it easy to integrate with LLM Gateway. ### Get Your API Key [#get-your-api-key] 1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard) 2. Navigate to **API Keys** section 3. Create a new API key and copy the key LLM Gateway API Keys ### Configure Cursor Settings [#configure-cursor-settings] 1. Open Cursor and go to **Settings** then Click on "Cursor Settings" 2. Click on "Models" 3. Click on "Add OpenAI API Key" Cursor Settings 3. Scroll down to **OpenAI API Key** section 4. Click on **Add OpenAI API Key** Cursor API Key Input 5. Enter your LLM Gateway API key 6. In the same Models settings, find the **Override OpenAI Base URL** option 7. Enable the override option 8. Enter the LLM Gateway endpoint: `https://api.llmgateway.io/v1` ### Select Models [#select-models] 1. In the **Models** section, you can now select from available models 2. Choose any [LLM Gateway supported model](https://llmgateway.io/models): Cursor Model Selection * For chat: Use models like `gpt-5`, `gpt-4o`, `claude-sonnet-4-5` * For custom models: Add the provider name before the model name (e.g. `custom/my-model`) * For discounted models: copy the ids from from the [models page](https://llmgateway.io/models?view=grid\&filters=1\&discounted=true) * For free models: copy the ids from from the [models page](https://llmgateway.io/models?view=grid\&filters=1\&free=true) * For reasoning models: copy the ids from from the [models page](https://llmgateway.io/models?view=grid\&filters=1\&reasoning=true) ### Test the Integration [#test-the-integration] 1. Open any code file in Cursor 2. Try using the AI chat (Cmd/Ctrl + L) 3. Or test the autocomplete feature while typing Cursor AI Chat Cursor AI Chat 2 All AI requests will now be routed through LLM Gateway. ## What Works (and What Doesn't) [#what-works-and-what-doesnt] Cursor only honors the custom OpenAI base URL for **plan mode** — the chat / planning panel (Cmd/Ctrl + L). Everything else still uses Cursor's own backend, even after you save the LLM Gateway key. ### Works through LLM Gateway [#works-through-llm-gateway] * **AI Chat / Plan mode (Cmd/Ctrl + L)** — Ask questions, plan changes, get explanations, debug. All requests route through LLM Gateway and appear in your dashboard. ### Does NOT work through LLM Gateway [#does-not-work-through-llm-gateway] * **Composer / Coding agent** — Locked to Cursor's backend. * **Inline Edit (Cmd/Ctrl + K)** — Locked to Cursor's backend. * **Autocomplete / Tab completion** — Locked to Cursor's backend. * **Apply / Edit suggestions** — Locked to Cursor's backend. If you need a full coding agent that routes through LLM Gateway, use [Claude Code](/guides/claude-code), [Codex CLI](/guides/codex-cli), [Cline](/guides/cline), [Continue CLI](/guides/continue), or [Hermes Agent](/guides/hermes-agent). ### Model Routing [#model-routing] With LLM Gateway's [routing features](/features/routing), you can: * **Chooses cost-effective models** by default for optimal price-to-performance ratio * **Automatically scales to more powerful models** based on your request's context size * **Handles large contexts intelligently** by selecting models with appropriate context windows ## Troubleshooting [#troubleshooting] ### Authentication Errors [#authentication-errors] If you see authentication errors: * Verify your API key is correct * Check that the base URL is set to `https://api.llmgateway.io/v1` * Ensure your LLM Gateway account has sufficient credits ### Model Not Found [#model-not-found] If you see "model not found" errors: * Verify the model ID exists in the [models page](https://llmgateway.io/models) * Check that you're using the correct model name format * Some models may require specific provider configurations in your LLM Gateway dashboard ### Slow Responses [#slow-responses] If responses are slow: * Check your internet connection * Monitor your usage in the LLM Gateway dashboard * Switch to a faster chat model from the [models page](https://llmgateway.io/models) ### Composer / agent / autocomplete still uses Cursor's models [#composer--agent--autocomplete-still-uses-cursors-models] This is expected. Cursor only routes the chat / plan panel through the custom API key — Composer, inline edit, and autocomplete are locked to Cursor's own backend. See [What Works (and What Doesn't)](#what-works-and-what-doesnt) above. Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. ## Benefits of Using LLM Gateway with Cursor [#benefits-of-using-llm-gateway-with-cursor] * **Multi-Provider Access**: Use models from OpenAI, Anthropic, Google, Open-source models and more * **Cost Control**: Track and limit your AI spending with detailed usage analytics * **Caching**: Reduce costs with response caching * **Analytics**: Monitor usage patterns and costs # Hermes Agent Integration URL: https://docs.llmgateway.io/guides/hermes-agent [Hermes Agent](https://github.com/nousresearch/hermes-agent) is an open-source AI coding agent for your terminal built by Nous Research. It supports tool use, browser automation, multi-provider routing, skills, and MCP servers. By pointing it at LLM Gateway you get access to 210+ models from 60+ providers, all tracked in one dashboard. One config change. No code changes. Full cost tracking. ## Prerequisites [#prerequisites] * Hermes Agent installed — see [installation](#installation) below or visit the [Hermes Agent repo](https://github.com/nousresearch/hermes-agent) * An LLM Gateway API key — [sign up free](https://llmgateway.io/signup) (no credit card required) ## Installation [#installation] Install Hermes Agent using the official install script: ```bash curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash ``` After installation, reload your shell and verify: ```bash source ~/.bashrc hermes --version ``` The installer handles Python 3.11, Node.js, ripgrep, and other dependencies automatically. See the [repo](https://github.com/nousresearch/hermes-agent) for Windows (PowerShell) and manual install options. ## Setup [#setup] ### Run the Setup Wizard [#run-the-setup-wizard] Run `hermes setup` to launch the interactive setup wizard. You can choose either **Quick setup** (option 1) for provider, model, and messaging configuration, or **Full setup** (option 2) to configure everything including tools, skills, and advanced options: ```bash hermes setup ``` Hermes Agent Setup Wizard In this guide we use Quick setup, but Full setup works the same way — it just includes additional configuration steps. ### Configure Inference Provider [#configure-inference-provider] The wizard will ask you to configure your inference provider. Select **Custom OpenAI-compatible endpoint** and enter the LLM Gateway base URL: ``` API base URL: https://api.llmgateway.io/v1 ``` Then paste your LLM Gateway API key (starts with `llmgtwy_`): Inference Provider Configuration ### Choose a Model [#choose-a-model] The wizard presents a list of 200+ available models. Type a model name or select from the list. Popular choices include `claude-sonnet-4-6`, `gpt-5.5`, or `gemini-3.1-pro`: Model Selection List ### Set Context Length [#set-context-length] Leave the context length blank to auto-detect (recommended), or specify a custom value: Context Length Configuration ### Set Display Name [#set-display-name] Give your provider configuration a display name. This appears in the Hermes status bar when chatting: Display Name Configuration ### Select Terminal Backend [#select-terminal-backend] Choose your terminal backend. In this guide we use **Local** (run directly on this machine), but you can pick any option based on your requirements — Docker for isolated containers, SSH for remote machines, Modal for serverless sandboxes, Daytona for cloud dev environments, and more: Terminal Backend Selection ### Setup Complete [#setup-complete] Once done, Hermes shows you where your config files are stored and how to edit them. It will prompt **"Launch hermes chat now? \[Y/n]"** — press `Y` to start an interactive agent session immediately: Setup Complete Your configuration files: * **Settings:** `~/.hermes/config.yaml` * **API Keys:** `~/.hermes/.env` * **Data:** `~/.hermes/cron/`, `sessions/`, `logs/` Once you press `Y`, Hermes launches a full agent session connected to LLM Gateway. You can start chatting right away. ## Using Hermes with LLM Gateway [#using-hermes-with-llm-gateway] Once configured, all requests route through LLM Gateway. You'll see the provider name (e.g., "LLMGATEWAY") in the Hermes status bar. ### Switching Models at Runtime [#switching-models-at-runtime] You can switch models mid-session using the `/model` slash command (similar to how Claude Code uses slash commands). Just type `/model` followed by the model name: Switching to Claude Haiku via LLM Gateway Switch to any model available through LLM Gateway — from Claude to GPT to open-source models — without leaving your session: Switching to GPT-5.4-nano via LLM Gateway Add `--global` to persist the model change across sessions. ### CLI Model Override [#cli-model-override] You can also override the model from the command line: ```bash # Use a specific model for this session hermes chat --model gpt-5.5 # Use a powerful model for complex tasks hermes chat --model claude-opus-4-6 ``` ## Why Use LLM Gateway with Hermes Agent [#why-use-llm-gateway-with-hermes-agent] * **210+ models** — Claude, GPT, Gemini, Llama, DeepSeek, and more * **One API key** — Stop managing separate keys for each provider * **Cost tracking** — See exactly what each session costs in your dashboard * **Response caching** — Repeated requests hit cache automatically * **Automatic fallback** — If a provider is down, requests route to an alternative * **Volume discounts** — Check [discounted models](https://llmgateway.io/models?discounted=true) for savings up to 90% ## One-Shot Mode [#one-shot-mode] For scripting or CI pipelines, use the `-q` flag for a one-shot prompt: ```bash hermes chat -q "Explain what this function does" -Q ``` The `-Q` flag enables quiet mode, suppressing the banner and spinner for clean output. For pure one-shot mode (no interactive session): ```bash hermes chat -z "Generate a README for this project" ``` ## Useful Hermes Commands [#useful-hermes-commands] | Command | Purpose | | ---------------------- | --------------------------------------- | | `hermes` | Start interactive chat (default) | | `hermes setup` | Run the setup wizard | | `hermes setup model` | Change model/provider | | `hermes chat -q "..."` | One-shot prompt | | `hermes model` | Choose provider and model interactively | | `hermes config edit` | Open config in your editor | | `hermes doctor` | Diagnose connection/config issues | | `hermes sessions` | Browse and manage past sessions | | `hermes --continue` | Resume most recent session | | `hermes update` | Update to latest version | ## Locking to a Specific Provider [#locking-to-a-specific-provider] By default, LLM Gateway automatically fails over to alternative providers if your chosen provider is experiencing downtime. To disable fallback and always route to one provider, you can add the header via Hermes's request configuration. Disabling fallback means requests will fail if the chosen provider is down. See the [routing docs](/docs/features/routing) for details. ## Troubleshooting [#troubleshooting] ### Model not found [#model-not-found] If you get a "model not supported" error, check that your model ID matches exactly what's listed on the [models page](https://llmgateway.io/models). Model IDs are case-sensitive. ### Connection timeout [#connection-timeout] Verify your `base_url` is set to `https://api.llmgateway.io/v1` (note the `/v1` at the end). You can also check the `HERMES_API_TIMEOUT` environment variable if you're hitting timeouts on long-running requests. ### Authentication errors [#authentication-errors] Make sure your `api_key` starts with `llmgtwy_` and is valid. Check your [dashboard](https://llmgateway.io/dashboard) to confirm the key is active. ### Diagnosing issues [#diagnosing-issues] Run `hermes doctor` to check your configuration, connectivity, and credentials: ```bash hermes doctor ``` ### Old config overrides [#old-config-overrides] If you previously used a different provider (e.g., OpenRouter), make sure to update both `provider` and `base_url` fields. The `provider` must be set to `"custom"` for LLM Gateway. Also check `~/.hermes/.env` for any leftover `OPENROUTER_API_KEY` or other provider keys that might take precedence. View all available models on the [models page](https://llmgateway.io/models). Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # Kilo Code Integration URL: https://docs.llmgateway.io/guides/kilo-code [Kilo Code](https://kilo.ai/) is an AI coding assistant that runs as a VS Code extension. It supports autonomous coding, file editing, terminal commands, and browser automation. LLM Gateway is a built-in provider in Kilo Code, so setup takes under a minute — no manual base URL configuration required. ## Prerequisites [#prerequisites] * VS Code or a VS Code-based editor (Cursor, Windsurf, etc.) * An LLM Gateway API key — [sign up free](https://llmgateway.io/signup) (no credit card required) ## Setup [#setup] ### Install Kilo Code [#install-kilo-code] Open VS Code, go to the Extensions view (Ctrl+Shift+X / Cmd+Shift+X), search for **Kilo Code**, and click **Install**. Alternatively, install from the [VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=kilocode.kilo-code). ### Open Providers Settings [#open-providers-settings] Click the Kilo Code icon in the VS Code sidebar, then open **Settings > Providers**. You'll see the list of popular providers: Kilo Code Providers screen ### Find LLM Gateway [#find-llm-gateway] Click **Show more providers** at the bottom of the list. In the "Connect provider" dialog, type `llm` in the search box — **LLM Gateway** will appear: Searching for LLM Gateway Click the **+** button next to LLM Gateway. ### Enter Your API Key [#enter-your-api-key] Kilo Code will show the **Connect LLM Gateway** dialog. Paste your LLM Gateway API key (starts with `llmgtwy_`) and click **Submit**: Connect LLM Gateway — enter API key [Sign up](https://llmgateway.io/signup) or log in to your LLM Gateway dashboard and navigate to **API Keys** to get your key. ### Start Coding [#start-coding] Once connected, select an LLM Gateway model from the model picker at the bottom of the chat panel. All requests now route through LLM Gateway — you'll see usage, costs, and logs in your [dashboard](https://llmgateway.io/dashboard): Kilo Code chat active with LLM Gateway ## Why Use LLM Gateway with Kilo Code [#why-use-llm-gateway-with-kilo-code] * **210+ models** — Claude, GPT, Gemini, Llama, DeepSeek, and more from 60+ providers * **One API key** — Stop managing separate keys for each provider * **Cost tracking** — See exactly what each session costs in your dashboard * **Response caching** — Repeated requests hit cache automatically * **Automatic fallback** — If a provider is down, requests route to an alternative * **Volume discounts** — Check [discounted models](https://llmgateway.io/models?discounted=true) for savings up to 90% ## Features [#features] Once configured, you can use all of Kilo Code's features with LLM Gateway: * **Autonomous coding** — Create and edit files, build features from natural language * **Terminal commands** — Run builds, tests, and scripts directly from the chat * **Browser automation** — Preview and interact with web apps * **Checkpoints** — Save and restore session states * **Multiple modes** — Switch between Code, Architect, Ask, and Debug modes ## Switching Models [#switching-models] Click the model name at the bottom of the Kilo Code chat panel to open the model picker. Select any LLM Gateway model — the switch takes effect immediately for the next message. ## Troubleshooting [#troubleshooting] ### LLM Gateway not in provider list [#llm-gateway-not-in-provider-list] Click **Show more providers** at the bottom of the Providers page. In the search dialog, type "llm" or "gateway" to find it. ### Authentication errors [#authentication-errors] Make sure your API key starts with `llmgtwy_` and is active. Check your [dashboard](https://llmgateway.io/dashboard) to confirm the key is valid. ### Model not found [#model-not-found] Verify the model ID matches exactly what's listed on the [models page](https://llmgateway.io/models). Model IDs are case-sensitive. View all available models on the [models page](https://llmgateway.io/models). Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # Kimi Code Integration URL: https://docs.llmgateway.io/guides/kimi-code [Kimi Code CLI](https://github.com/MoonshotAI/kimi-code) is an open-source, AI-powered coding agent developed by Moonshot AI designed to automate software development tasks directly within your terminal. It can read and edit code, execute shell commands, search files, and autonomously manage complex coding workflows. By configuring Kimi Code CLI to use LLM Gateway, you can point it at any model—GPT-5, Gemini, Llama, Claude, or 210+ others—while keeping the same API formats Kimi Code expects, with full cost tracking in your dashboard. ## Prerequisites [#prerequisites] * An LLM Gateway API key — [sign up free](https://llmgateway.io/signup) (no credit card required) ## Setup [#setup] ### Install Kimi Code CLI [#install-kimi-code-cli] If you haven't already, install Kimi Code CLI. * **macOS or Linux**: ```bash curl -fsSL https://code.kimi.com/kimi-code/install.sh | bash ``` * **Homebrew (macOS/Linux)**: ```bash brew install kimi-code ``` * **Windows (PowerShell)**: ```powershell irm https://code.kimi.com/kimi-code/install.ps1 | iex ``` Confirm the installation: ```bash kimi --version ``` ### Configure config.toml [#configure-configtoml] Create or edit your Kimi Code configuration file at `~/.kimi-code/config.toml` (on Windows, this is typically under `C:\Users\\.kimi-code\config.toml`). Add the `llmgateway` provider and define the models you want to use. Here is an example configuration that sets up **GPT-5.5**, **Claude Opus 4.6**, **DeepSeek V4 Pro**, **MiniMax M3**, and **Qwen3.7 Max**: ```toml default_model = "llmgateway/gpt-5.5" [providers.llmgateway] type = "openai" api_key = "llmgtwy_your_api_key_here" base_url = "https://api.llmgateway.io/v1" [models."llmgateway/gpt-5.5"] provider = "llmgateway" model = "gpt-5.5" max_context_size = 1050000 max_output_size = 128000 capabilities = [ "image_in", "thinking", "tool_use" ] display_name = "GPT-5.5" [models."llmgateway/claude-opus-4-6"] provider = "llmgateway" model = "claude-opus-4-6" max_context_size = 1000000 max_output_size = 128000 capabilities = [ "image_in", "thinking", "tool_use" ] display_name = "Claude Opus 4.6" [models."llmgateway/deepseek-v4-pro"] provider = "llmgateway" model = "deepseek-v4-pro" max_context_size = 1050000 max_output_size = 393216 capabilities = [ "thinking", "tool_use" ] display_name = "DeepSeek V4 Pro" [models."llmgateway/minimax-m3"] provider = "llmgateway" model = "minimax-m3" max_context_size = 1048576 max_output_size = 131072 capabilities = [ "image_in", "thinking", "tool_use" ] display_name = "MiniMax M3" [models."llmgateway/qwen3.7-max"] provider = "llmgateway" model = "qwen3.7-max" max_context_size = 1000000 max_output_size = 65536 capabilities = [ "thinking", "tool_use" ] display_name = "Qwen3.7 Max" ``` Configuring config.toml Replace `llmgtwy_your_api_key_here` with your actual LLM Gateway API key from the dashboard. ### Run Kimi Code CLI [#run-kimi-code-cli] Navigate to your project folder and launch the interactive terminal: ```bash kimi ``` All requests will now be routed through LLM Gateway, allowing you to use advanced models for local autonomous coding while showing real-time usage and cost statistics on your LLM Gateway dashboard. Running Kimi Code with LLM Gateway ## Configuration Details [#configuration-details] ### The Providers Section [#the-providers-section] To connect to LLM Gateway, define a custom provider with `type = "openai"` and specify the base URL pointing to the LLM Gateway endpoint. ```toml [providers.llmgateway] type = "openai" api_key = "llmgtwy_your_api_key_here" base_url = "https://api.llmgateway.io/v1" ``` ### Defining Custom Models [#defining-custom-models] For each model you want to access, add a `[models."/"]` block: * **provider**: Must match the provider key under `[providers.]` (e.g. `llmgateway`). * **model**: The exact model ID from the LLM Gateway catalog. * **capabilities**: An array containing capabilities the model supports, such as `"image_in"`, `"thinking"`, and `"tool_use"`. * **max\_context\_size**: The maximum context window of the model. ## Why Use LLM Gateway with Kimi Code CLI [#why-use-llm-gateway-with-kimi-code-cli] * **210+ models** — Access GPT-5, Gemini, Llama, DeepSeek, and more in a single CLI configuration. * **Unified cost tracking** — Get a detailed breakdown of costs per prompt and session in your dashboard. * **Response caching** — Automatically cache repeated requests (such as parsing or building commands) to save API costs. * **Automatic fallback** — Keep coding even if a provider encounters temporary downtime. * **Volume discounts** — Access selected models with up to 90% savings compared to standard pricing. View all available models on the [models page](https://llmgateway.io/models). Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # Model Context Protocol (MCP) URL: https://docs.llmgateway.io/guides/mcp LLM Gateway provides a Model Context Protocol (MCP) server that enables AI assistants like Claude Code to access multiple LLM providers through a unified interface. This allows you to use any model from OpenAI, Anthropic, Google, and more directly from your AI coding assistant. ## What is MCP? [#what-is-mcp] The Model Context Protocol (MCP) is an open standard that allows AI assistants to connect with external tools and data sources. LLM Gateway's MCP server exposes tools for: * **Chat completions** - Send messages to any supported LLM * **Image generation** - Generate images using models like Qwen Image * **Nano Banana image generation** - Generate images with Gemini 3 Pro Image Preview and optionally save to disk * **Model discovery** - List available models with capabilities and pricing ## Available Tools [#available-tools] ### `chat` [#chat] Send a message to any LLM and get a response. **Parameters:** * `model` (string) - The model to use (e.g., `"gpt-4o"`, `"claude-sonnet-4-20250514"`) * `messages` (array) - Array of messages with `role` and `content` * `temperature` (number, optional) - Sampling temperature (0-2) * `max_tokens` (number, optional) - Maximum tokens to generate **Example:** ```json { "model": "gpt-4o", "messages": [{ "role": "user", "content": "Explain quantum computing" }], "temperature": 0.7 } ``` ### `generate-image` [#generate-image] Generate images from text prompts using AI image models. **Parameters:** * `prompt` (string) - Text description of the image to generate * `model` (string, optional) - Image model (default: `"qwen-image-plus"`) * `size` (string, optional) - Image size (default: `"1024x1024"`) * `n` (number, optional) - Number of images (1-4, default: 1) **Example:** ```json { "prompt": "A serene mountain landscape at sunset", "model": "qwen-image-max", "size": "1024x1024" } ``` ### `generate-nano-banana` [#generate-nano-banana] Generate an image using Gemini 3 Pro Image Preview ("Nano Banana"). Returns an inline image preview, and optionally saves the image to disk when the server is configured with an upload directory. **Parameters:** * `prompt` (string) - Text description of the image to generate * `filename` (string, optional) - Filename for the saved image, no path separators allowed (default: `nano-banana-{timestamp}.png`) * `aspect_ratio` (string, optional) - Aspect ratio: `"1:1"`, `"16:9"`, `"4:3"`, or `"5:4"` **Example:** ```json { "prompt": "A pixel-art cat sitting on a rainbow", "filename": "hero-image.png", "aspect_ratio": "16:9" } ``` **Saving images to disk** requires the `UPLOAD_DIR` environment variable to be set on the MCP server. When set, images are saved to that directory. Without it, images are returned inline only — no files are written to disk. See [Enabling local image saving](#enabling-local-image-saving) for setup instructions. ### `list-models` [#list-models] List available LLM models with capabilities and pricing. **Parameters:** * `include_deactivated` (boolean, optional) - Include deactivated models * `exclude_deprecated` (boolean, optional) - Exclude deprecated models * `limit` (number, optional) - Maximum models to return (default: 20) * `family` (string, optional) - Filter by family (e.g., `"openai"`, `"anthropic"`) ### `list-image-models` [#list-image-models] List all available image generation models. **Example output:** ``` # Image Generation Models ## Qwen Image Plus - **Model ID:** `qwen-image-plus` - **Description:** Text-to-image with excellent text rendering - **Price:** $0.03 per request ## Qwen Image Max - **Model ID:** `qwen-image-max` - **Description:** Highest quality text-to-image - **Price:** $0.075 per request ``` ## Setup [#setup] ### Get Your API Key [#get-your-api-key] 1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard) 2. Navigate to **API Keys** section 3. Create a new API key and copy it ### Configure Claude Code [#configure-claude-code] Run the following command in your terminal: ```bash claude mcp add --transport http --scope user llmgateway https://api.llmgateway.io/mcp \ --header "Authorization: Bearer your-api-key-here" ``` **Alternative: Manual configuration** You can also add the MCP server manually by editing `~/.claude.json` (user scope) or `.mcp.json` in your project root (project scope): ```json { "mcpServers": { "llmgateway": { "url": "https://api.llmgateway.io/mcp", "headers": { "Authorization": "Bearer your-api-key-here" } } } } ``` Restart Claude Code after manual configuration changes. ### Test the Integration [#test-the-integration] Try using the tools in Claude Code: * "Use the chat tool to ask GPT-4o about TypeScript best practices" * "Generate an image of a futuristic city using the generate-image tool" * "Use generate-nano-banana to create a hero image for my landing page" * "List all available models from Anthropic" ### Get Your API Key [#get-your-api-key-1] 1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard) 2. Navigate to **API Keys** section 3. Create a new API key and copy it 4. Set it as an environment variable: `export LLM_GATEWAY_API_KEY="your-api-key-here"` ### Configure Codex [#configure-codex] Run the following command in your terminal: ```bash codex mcp add llmgateway --url https://api.llmgateway.io/mcp \ --bearer-token-env-var LLM_GATEWAY_API_KEY ``` **Alternative: Manual configuration** You can also add the MCP server manually by editing `~/.codex/config.toml`: ```toml [mcp_servers.llmgateway] url = "https://api.llmgateway.io/mcp" bearer_token_env_var = "LLM_GATEWAY_API_KEY" ``` ### Test the Integration [#test-the-integration-1] Run `/mcp` in the Codex TUI to confirm the `llmgateway` server is connected. Try: * "Use the chat tool to ask GPT-4o about TypeScript best practices" * "Generate an image of a futuristic city using the generate-image tool" * "Use generate-nano-banana to create a hero image for my landing page" * "List all available models from Anthropic" ### Get Your API Key [#get-your-api-key-2] 1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard) 2. Navigate to **API Keys** section 3. Create a new API key and copy it ### Configure Cursor [#configure-cursor] Add the following to your Cursor MCP configuration file (`~/.cursor/mcp.json`): ```json { "mcpServers": { "llmgateway": { "url": "https://api.llmgateway.io/mcp", "headers": { "Authorization": "Bearer your-api-key-here" } } } } ``` Or open the Command Palette (`Cmd/Ctrl + Shift + P`), search for **"Cursor Settings"**, then go to **Tools & Integrations** > **Add Custom MCP** and paste the configuration above. Cursor v0.48.0+ is required for Streamable HTTP MCP support. ### Test the Integration [#test-the-integration-2] Open a chat in **Agent Mode**, click the **Select Tools** icon, and verify the LLM Gateway tools appear. Try: * "Use the chat tool to ask GPT-4o about TypeScript best practices" * "Generate an image of a futuristic city using the generate-image tool" * "Use generate-nano-banana to create a hero image for my landing page" * "List all available models from Anthropic" LLM Gateway's MCP server supports the standard HTTP Streamable transport. Configure your client with: * **Endpoint:** `https://api.llmgateway.io/mcp` * **Authentication:** Bearer token via `Authorization` header or `x-api-key` header * **Protocol Version:** 2024-11-05 **Direct HTTP Example:** ```bash curl -X POST https://api.llmgateway.io/mcp \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "jsonrpc": "2.0", "id": 1, "method": "tools/list" }' ``` **Server-Sent Events (SSE):** For real-time updates, connect with `Accept: text/event-stream`: ```bash curl -N https://api.llmgateway.io/mcp \ -H "Accept: text/event-stream" \ -H "Authorization: Bearer your-api-key" ``` ## Use Cases [#use-cases] ### Multi-Model Access in Claude Code [#multi-model-access-in-claude-code] Use Claude Code to interact with models it doesn't natively support: ``` Use the chat tool with model "gpt-4o" to analyze this code for security issues. ``` ### Image Generation [#image-generation] Generate images directly from your AI assistant: ``` Use generate-image to create a logo for my new startup. It should be minimalist, blue and white, representing AI and cloud computing. ``` ### Nano Banana (Gemini Image Generation) [#nano-banana-gemini-image-generation] Generate images with Gemini 3 Pro for use in your project: ``` Use generate-nano-banana to create a hero image for my landing page with a 16:9 aspect ratio. ``` ### Cost-Effective Model Selection [#cost-effective-model-selection] Query available models to find the best option for your task: ``` List models from OpenAI and Anthropic, then use the cheapest one for this simple task. ``` ## Authentication [#authentication] The MCP server supports two authentication methods: 1. **Bearer Token** - `Authorization: Bearer your-api-key` 2. **API Key Header** - `x-api-key: your-api-key` Your API key is the same one you use for the REST API and works across all LLM Gateway services. ## OAuth Support [#oauth-support] For applications that prefer OAuth authentication, LLM Gateway's MCP server implements OAuth 2.0: * **Authorization Endpoint:** `/oauth/authorize` * **Token Endpoint:** `/oauth/token` * **Registration Endpoint:** `/oauth/register` * **Supported Flows:** Authorization Code, Client Credentials ## Enabling Local Image Saving [#enabling-local-image-saving] By default, `generate-nano-banana` returns images inline without writing to disk. To enable saving generated images to the server filesystem, the `UPLOAD_DIR` environment variable must be set on the **gateway host** at startup. This is a server-side setting — it cannot be configured from the client. This is only possible for **self-hosted** MCP deployments. Configure `UPLOAD_DIR` using your deployment method: * **Docker:** Pass `-e UPLOAD_DIR=/data/images` or add it to your `docker-compose.yml` environment section. * **systemd:** Add `Environment=UPLOAD_DIR=/data/images` to your service unit file. * **.env file:** Add `UPLOAD_DIR=/data/images` to the `.env` file loaded by your gateway process. The shared hosted endpoint (`api.llmgateway.io`) does not support configuring `UPLOAD_DIR`. On the hosted service, images are always returned inline — no files are written to disk. To enable server-side image saving, you must self-host the MCP server and set `UPLOAD_DIR` at startup. ## Troubleshooting [#troubleshooting] ### Connection Errors [#connection-errors] If you're having trouble connecting: 1. Verify your API key is valid 2. Check the endpoint URL is correct: `https://api.llmgateway.io/mcp` 3. Ensure your firewall allows outbound HTTPS connections ### Tool Not Found [#tool-not-found] If tools aren't appearing: 1. Restart your MCP client 2. Check the configuration syntax 3. Verify the MCP server is responding: `GET https://api.llmgateway.io/mcp` ### Rate Limiting [#rate-limiting] The MCP server respects your account's rate limits. If you're hitting limits: 1. Check your usage in the dashboard 2. Consider upgrading your plan 3. Implement request queuing in your application Need help? Join our [Discord community](https://llmgateway.io/discord) for support. ## Benefits [#benefits] * **Unified Access** - Use 200+ models from 20+ providers through one interface * **Cost Tracking** - Monitor usage and costs in the LLM Gateway dashboard * **Caching** - Automatic response caching reduces costs and latency * **Fallback** - Automatic provider failover ensures reliability * **Image Generation** - Generate images directly from your AI assistant # MiMo Code Integration URL: https://docs.llmgateway.io/guides/mimocode [MiMo Code](https://mimo.xiaomi.com/mimocode) is an AI-powered coding agent command-line tool developed by Xiaomi. It can understand your code repository, plan changes, safely execute shell commands, edit files, and autonomously manage complex software development tasks in your terminal. By configuring MiMo Code to route through LLM Gateway, you can point it at any model—GPT-5.5, Gemini, Llama, Claude, or 210+ others—while keeping the same API format MiMo Code expects, with full cost tracking in your dashboard. ## Prerequisites [#prerequisites] * An LLM Gateway API key — [sign up free](https://llmgateway.io/signup) (no credit card required) ## Setup [#setup] ### Install MiMo Code [#install-mimo-code] If you haven't already, install MiMo Code by running the official installation command in your terminal: ```bash curl -fsSL https://mimo.xiaomi.com/install | bash ``` Confirm the installation by checking the help command: ```bash mimo --help ``` ### Configure mimocode.json [#configure-mimocodejson] Create or edit your MiMo Code configuration file at `~/.config/mimocode/mimocode.json` (on Linux/macOS) or `~/.mimocode/mimocode.json`. Specify the default models you want to use and route the `anthropic` provider to your LLM Gateway endpoint. Here is an example configuration that sets up **Claude Opus 4.8**, **GPT-5.5**, **DeepSeek V4 Pro**, **MiniMax M3**, and **Qwen3.7 Max**: ```json { "model": "anthropic/claude-opus-4-8", "small_model": "anthropic/claude-4-5-haiku-latest", "provider": { "anthropic": { "options": { "apiKey": "llmgtwy_your_api_key_here", "baseURL": "https://api.llmgateway.io/v1" }, "models": { "gpt-5.5": { "name": "gpt-5.5" }, "claude-opus-4-8": { "name": "claude-opus-4-8" }, "deepseek-v4-pro": { "name": "deepseek-v4-pro" }, "minimax-m3": { "name": "minimax-m3" }, "qwen3.7-max": { "name": "qwen3.7-max" } } } } } ``` Configuring mimocode.json Replace `llmgtwy_your_api_key_here` with your actual LLM Gateway API key from the dashboard. ### Alternatively: Use Environment Variables [#alternatively-use-environment-variables] If you prefer to configure the provider dynamically, you can export the standard Anthropic environment variables before starting MiMo Code: ```bash export ANTHROPIC_API_KEY=llmgtwy_your_api_key_here export ANTHROPIC_BASE_URL=https://api.llmgateway.io/v1 ``` ### Run MiMo Code [#run-mimo-code] Navigate to your project folder and launch the TUI or run a prompt directly: ```bash mimo ``` Or run it with a message: ```bash mimo run "Your coding prompt here" ``` All requests will now be routed through LLM Gateway, allowing you to use advanced models for local autonomous coding while showing real-time usage and cost statistics on your LLM Gateway dashboard. Running MiMo Code with LLM Gateway ## Configuration Details [#configuration-details] ### The Provider Options [#the-provider-options] To point MiMo Code to LLM Gateway, you define the `baseURL` and `apiKey` inside the `options` of the `anthropic` provider block. ```json "provider": { "anthropic": { "options": { "apiKey": "llmgtwy_your_api_key_here", "baseURL": "https://api.llmgateway.io/v1" } } } ``` ### Defining Custom Models [#defining-custom-models] Because MiMo Code CLI restricts requests to built-in models by default, any custom model you wish to target (such as `gpt-5.5` or `deepseek-v4-pro`) must be registered in the `models` dictionary within the `anthropic` provider config: ```json "models": { "gpt-5.5": { "name": "gpt-5.5" } } ``` Once registered, you can set them as your default model or small model using the `anthropic/` prefix (e.g. `"model": "anthropic/gpt-5.5"`). ## Why Use LLM Gateway with MiMo Code [#why-use-llm-gateway-with-mimo-code] * **210+ models** — Access GPT-5.5, Gemini, Llama, DeepSeek, and more in a single CLI configuration. * **Unified cost tracking** — Get a detailed breakdown of costs per prompt and session in your dashboard. * **Response caching** — Automatically cache repeated requests (such as parsing or building commands) to save API costs. * **Automatic fallback** — Keep coding even if a provider encounters temporary downtime. * **Volume discounts** — Access selected models with up to 90% savings compared to standard pricing. View all available models on the [models page](https://llmgateway.io/models). Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # N8n Integration URL: https://docs.llmgateway.io/guides/n8n n8n is a powerful workflow automation tool that can be enhanced with AI capabilities through LLM Gateway. This guide shows how to integrate LLM Gateway into your n8n workflows. n8n workflow with LLM Gateway ## Prerequisites [#prerequisites] * An LLM Gateway account with an API key * n8n instance (self-hosted or cloud) * Basic understanding of n8n workflows ## Setup [#setup] The easiest way to use LLM Gateway with n8n is through the OpenAI node with custom configuration. ### Add OpenAI Credentials [#add-openai-credentials] 1. In n8n, go to **Settings** → **Credentials** n8n credentials 2. Click **Add Credential** → **OpenAI** n8n credentials 3. Configure as follows: * **API Key**: Your LLM Gateway API key * **Base URL**: `https://api.llmgateway.io/v1` * **Organization ID**: Leave blank n8n credentials ### Configure OpenAI Node [#configure-openai-node] 1. Add an **AI Agent** node to your workflow 2. Add a **Chat Model** edge to the node n8n credentials 3. Configure the node to use the LLMGateway provider n8n credentials Note: You have to toggle off the responses API. LLMGateway does not support it. responses api 4. Select your desired options * **Model**: Use any [LLMGateway model](https://llmgateway.io/models) ID (e.g., `gpt-5`) * **Options**: Optionally, configure LLM parameters n8n credentials ### Test Workflow [#test-workflow] Finally, try running your workflow with a test prompt. n8n credentials # OpenClaw Integration URL: https://docs.llmgateway.io/guides/openclaw [OpenClaw](https://docs.openclaw.ai/) is a self-hosted gateway that connects your favorite chat apps—WhatsApp, Telegram, Discord, iMessage, and more—to AI coding agents. With LLM Gateway as a custom provider, you can route all your OpenClaw traffic through a single API, use any of 180+ models, and keep full visibility into usage and costs. ## Setup [#setup] ### Sign Up for LLM Gateway [#sign-up-for-llm-gateway] [Sign up free](https://llmgateway.io/signup) — no credit card required. Copy your API key from the dashboard. ### Set Your API Key [#set-your-api-key] ```bash export LLMGATEWAY_API_KEY=llmgtwy_your_api_key_here ``` ### Configure OpenClaw [#configure-openclaw] Add LLM Gateway as a custom provider in your `~/.openclaw/openclaw.json`: ```json { "models": { "mode": "merge", "providers": { "llmgateway": { "baseUrl": "https://api.llmgateway.io/v1", "apiKey": "${LLMGATEWAY_API_KEY}", "api": "openai-completions", "models": [ { "id": "gpt-5.4", "name": "GPT-5.4", "contextWindow": 128000, "maxTokens": 32000 }, { "id": "claude-opus-4-6", "name": "Claude Opus 4.6", "contextWindow": 200000, "maxTokens": 8192 }, { "id": "gemini-3-1-pro-preview", "name": "Gemini 3.1 Pro", "contextWindow": 1000000, "maxTokens": 8192 } ] } } }, "agents": { "defaults": { "model": { "primary": "llmgateway/gpt-5.4" } } } } ``` ### Start Chatting [#start-chatting] Launch OpenClaw and start chatting across your connected channels. All requests will be routed through LLM Gateway. ## Why Use LLM Gateway with OpenClaw [#why-use-llm-gateway-with-openclaw] * **Model flexibility** — Switch between GPT-5.4, Claude Opus, Gemini, or any of 180+ models * **Cost tracking** — Monitor exactly how much your chat agents cost to run * **Single bill** — No need to manage multiple API provider accounts * **Response caching** — Repeated queries hit cache, reducing costs * **Rate limit handling** — Automatic fallback between providers ## Switching Models [#switching-models] Change the primary model in your config to switch between any model: ```json { "agents": { "defaults": { "model": { "primary": "llmgateway/claude-opus-4-6" } } } } ``` ## Model Fallback Chain [#model-fallback-chain] OpenClaw supports fallback models. If the primary model is unavailable, it automatically falls back: ```json { "agents": { "defaults": { "model": { "primary": "llmgateway/gpt-5.4", "fallbacks": ["llmgateway/claude-opus-4-6"] } } } } ``` ## Available Models [#available-models] LLM Gateway uses root model IDs with smart routing—automatically selecting the best provider based on uptime, throughput, price, and latency. You can use any model from the [models page](https://llmgateway.io/models). Flagship models include: | Model | Best For | | ------------------------ | ------------------------------------------- | | `gpt-5.4` | Latest OpenAI flagship, highest quality | | `claude-opus-4-6` | Anthropic's most capable model | | `claude-sonnet-4-6` | Fast reasoning with extended thinking | | `gemini-3-1-pro-preview` | Google's latest flagship, 1M context window | | `o3` | Advanced reasoning tasks | | `gpt-5.4-pro` | Premium tier with extended reasoning | | `gemini-2.5-flash` | Fast responses, good for high-volume | | `claude-haiku-4-5` | Cost-effective, quick responses | | `grok-3` | xAI flagship | | `deepseek-v3.1` | Open-source with vision and tools | For more details on routing behavior, see [routing](/features/routing). View all available models on the [models page](https://llmgateway.io/models). ## Tips for Chat Agents [#tips-for-chat-agents] ### Optimize Costs [#optimize-costs] 1. **Use smaller models for simple tasks** — Claude Haiku or Gemini Flash handle basic Q\&A well 2. **Enable caching** — LLM Gateway caches identical requests automatically 3. **Set token limits** — Configure max tokens to prevent runaway costs ### Improve Response Quality [#improve-response-quality] 1. **Choose the right model** — Claude Opus excels at nuanced conversation, GPT-5.4 at general tasks 2. **Use system prompts** — Configure your agent's personality and capabilities 3. **Test multiple models** — LLM Gateway makes it easy to A/B test different providers Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # OpenCode Desktop Integration URL: https://docs.llmgateway.io/guides/opencode-desktop [OpenCode Desktop](https://opencode.ai/download) is the GUI desktop app version of OpenCode — an open-source AI coding agent with a full visual interface for managing providers, models, and sessions. LLM Gateway is a built-in provider, so setup takes under a minute with no config files required. Looking for the CLI version? See the [OpenCode CLI guide](/guides/opencode). ## Prerequisites [#prerequisites] * OpenCode Desktop installed — [download for Windows or macOS](https://opencode.ai/download) * An LLM Gateway API key — [sign up free](https://llmgateway.io/signup) (no credit card required) ## Installation [#installation] Download OpenCode Desktop from [opencode.ai/download](https://opencode.ai/download) and install it for your platform: * **macOS (Apple Silicon)** — `.dmg` installer * **macOS (Intel)** — `.dmg` installer * **Windows** — `.exe` installer You can also install on macOS via Homebrew: ```bash brew install --cask opencode-desktop ``` ## Setup [#setup] ### Open Providers Settings [#open-providers-settings] Launch OpenCode Desktop. Click the **Providers** section in the left sidebar under **Server**. You'll see the list of built-in providers: OpenCode Desktop Providers screen ### Find LLM Gateway [#find-llm-gateway] Click **Show more providers** at the bottom of the list, or click **+ Connect** on any entry to open the provider search. Type `LLM` in the search box — **LLM Gateway** will appear under "Other": Searching for LLM Gateway Select **LLM Gateway** from the list. ### Enter Your API Key [#enter-your-api-key] OpenCode will show the **Connect LLM Gateway** dialog. Paste your LLM Gateway API key (starts with `llmgtwy_`) and click **Continue**: Connect LLM Gateway — enter API key [Sign up](https://llmgateway.io/signup) or log in to your LLM Gateway dashboard and navigate to **API Keys** to get your key. ### Select a Model [#select-a-model] Once connected, open the model picker from the chat input bar. Type `llm` to filter LLM Gateway models — you'll see all available models including Claude Opus 4.7, Claude Sonnet 4.6, DeepSeek, Gemini, and more: LLM Gateway model selection ### Start Building [#start-building] Select a model and start chatting. All requests route through LLM Gateway — you'll see usage, costs, and logs in your [dashboard](https://llmgateway.io/dashboard): OpenCode Desktop chat active with LLM Gateway ## Why Use LLM Gateway with OpenCode Desktop [#why-use-llm-gateway-with-opencode-desktop] * **210+ models** — Claude, GPT, Gemini, Llama, DeepSeek, and more from 60+ providers * **One API key** — Stop managing separate keys for each provider * **Cost tracking** — See exactly what each session costs in your dashboard * **Response caching** — Repeated requests hit cache automatically * **Automatic fallback** — If a provider is down, requests route to an alternative * **Volume discounts** — Check [discounted models](https://llmgateway.io/models?discounted=true) for savings up to 90% ## Switching Models [#switching-models] You can switch models at any time from the model picker in the chat input bar. Click the current model name, type `llm` to filter to LLM Gateway models, and select a new one. The switch takes effect immediately for the next message. ## Locking to a Specific Provider [#locking-to-a-specific-provider] By default, LLM Gateway automatically fails over to alternative providers if your chosen provider is experiencing downtime. To disable fallback for a specific model, you can pass the `X-No-Fallback` header via a custom `opencode.json` in your project root: ```json { "provider": { "llmgateway": { "options": { "headers": { "X-No-Fallback": "true" } } } } } ``` Disabling fallback means requests will fail if the chosen provider is down. See the [routing docs](/docs/features/routing) for details. ## Troubleshooting [#troubleshooting] ### LLM Gateway doesn't appear in provider list [#llm-gateway-doesnt-appear-in-provider-list] Click **Show more providers** at the bottom of the Providers page to expand the full list, then search for "LLM". ### Authentication errors [#authentication-errors] Make sure your API key starts with `llmgtwy_` and is active. Check your [dashboard](https://llmgateway.io/dashboard) to confirm the key is valid. ### Models not loading after connect [#models-not-loading-after-connect] Try disconnecting and reconnecting the provider from Settings > Providers. If models still don't load, check your internet connection and verify the key is valid. View all available models on the [models page](https://llmgateway.io/models). Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # OpenCode Integration URL: https://docs.llmgateway.io/guides/opencode [OpenCode](https://opencode.ai) is an open-source AI coding agent for your terminal, IDE, or desktop. LLM Gateway is a built-in provider in OpenCode, so setup takes under a minute — no config files or npm adapters required. You get access to 210+ models from 60+ providers, all tracked in one dashboard. ## Prerequisites [#prerequisites] * OpenCode installed — visit the [OpenCode download page](https://opencode.ai/download) for your platform * An LLM Gateway API key ## Setup [#setup] ### Launch OpenCode [#launch-opencode] Start OpenCode from your terminal: ```bash opencode ``` **In VS Code/Cursor:** 1. Install the OpenCode extension from the marketplace 2. Open Command Palette (Ctrl+Shift+P or Cmd+Shift+P) 3. Type "OpenCode" and select "Open opencode" ### Open the Provider List [#open-the-provider-list] Once OpenCode launches, run the `/providers` or `/connect` command to open the provider selection screen. ### Select LLM Gateway [#select-llm-gateway] LLM Gateway is listed as a built-in provider. Select "LLM Gateway" from the provider list. ### Enter Your API Key [#enter-your-api-key] OpenCode will prompt you for your API key. Enter your LLM Gateway API key and press Enter. OpenCode will automatically save your credentials securely. [Sign up for LLM Gateway](https://llmgateway.io/signup) and create an API key from your dashboard. ### Start Using OpenCode [#start-using-opencode] You're all set! OpenCode is now connected to LLM Gateway. You can start asking questions and building with AI. ## Why Use LLM Gateway with OpenCode [#why-use-llm-gateway-with-opencode] * **210+ models** — GPT-5, Claude, Gemini, Llama, and more from 60+ providers * **One API key** — Stop juggling credentials for every provider * **Cost tracking** — See what each coding agent costs in your dashboard * **Response caching** — Repeated requests hit cache automatically * **Volume discounts** — The more you use, the more you save ## Adding Custom Models [#adding-custom-models] The built-in provider gives you access to all standard LLM Gateway models. If you want to add custom model aliases or configure models not yet listed in the built-in provider, you can create a `config.json` in your OpenCode configuration directory: **macOS/Linux:** `~/.config/opencode/config.json` **Windows:** `C:\Users\YourUsername\.config\opencode\config.json` ```json { "provider": { "llmgateway": { "npm": "@ai-sdk/openai-compatible", "name": "LLM Gateway", "options": { "baseURL": "https://api.llmgateway.io/v1" }, "models": { "deepseek/deepseek-chat": { "name": "DeepSeek Chat" }, "meta/llama-3.3-70b": { "name": "Llama 3.3 70B" } } } } } ``` After updating `config.json`, restart OpenCode to see the new models. ## Locking to a Specific Provider [#locking-to-a-specific-provider] By default, LLM Gateway automatically fails over to alternative providers if your chosen provider is experiencing downtime. If you want to lock into a specific provider/model mapping — for example to guarantee a fixed price or to always use a single provider — pass the `X-No-Fallback` header. Requests will then be sent only to the provider you specified, with no automatic fallback. ```json { "provider": { "llmgateway": { "npm": "@ai-sdk/openai-compatible", "name": "LLM Gateway", "options": { "baseURL": "https://api.llmgateway.io/v1", "headers": { "X-No-Fallback": "true" } } } } } ``` Disabling fallback means requests will fail if the chosen provider is down. See the [routing docs](/docs/features/routing) for details. ## Switching Models [#switching-models] Select a different model directly in the OpenCode interface, or update the `model` field in your configuration: ```json { "model": "llmgateway/gpt-5-mini" } ``` View all available models on the [models page](https://llmgateway.io/models). ## Troubleshooting [#troubleshooting] ### Connection timeout [#connection-timeout] Check that you have an active internet connection and that your API key is valid from the [dashboard](https://llmgateway.io/dashboard). ### Custom models not showing up [#custom-models-not-showing-up] After editing `config.json`, restart OpenCode completely for changes to take effect. ### 404 Not Found errors with custom config [#404-not-found-errors-with-custom-config] If you are using a custom `config.json`, verify your `baseURL` is set to `https://api.llmgateway.io/v1` (note the `/v1` at the end). ## Configuration Tips [#configuration-tips] * **Global configuration**: Use `~/.config/opencode/config.json` to apply settings across all projects * **Project-specific**: Place `opencode.json` in your project root to override global settings for that project * **Model selection**: You can specify different models for different types of tasks using OpenCode's agent configuration Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. # Pi Integration URL: https://docs.llmgateway.io/guides/pi [Pi](https://pi.dev) is a minimal terminal-based coding agent that gives an AI full access to read, write, edit, and run shell commands in your project. By pointing Pi at LLM Gateway, you can use any of our 200+ models — GPT-5.5, Gemini 3.1 Pro, Claude Opus 4.7, DeepSeek V4, and more — with full cost tracking and caching. ## Prerequisites [#prerequisites] * An LLM Gateway account with an API key * Pi installed (`curl -fsSL https://pi.dev/install.sh | bash`) * Basic terminal familiarity ## Setup [#setup] Pi uses a `models.json` configuration file to define providers and models. We'll add LLM Gateway as a custom provider. ### Get Your API Key [#get-your-api-key] 1. Log in to your [LLM Gateway dashboard](https://llmgateway.io/dashboard) 2. Navigate to **API Keys** section 3. Create a new API key and copy the key ### Configure Pi [#configure-pi] Open (or create) the Pi models configuration file at `~/.pi/agent/models.json` and add LLM Gateway as a provider: ```json { "providers": { "llmgateway": { "baseUrl": "https://api.llmgateway.io/v1", "api": "openai-completions", "apiKey": "llmgtwy_your_api_key_here", "models": [ { "id": "gpt-5.5", "name": "GPT-5.5" }, { "id": "claude-opus-4-7", "name": "Claude Opus 4.7" }, { "id": "gemini-3.1-pro", "name": "Gemini 3.1 Pro" }, { "id": "deepseek-v4", "name": "DeepSeek V4", "reasoning": true } ] } } } ``` Replace `llmgtwy_your_api_key_here` with your actual API key from Step 1. Pi models.json Configuration Pi reloads `models.json` when you open the `/model` menu — no restart needed after editing. ### Select Your Model [#select-your-model] 1. Run `pi` in any project directory 2. Type `/model` to open the model selector 3. Select your LLM Gateway model from the list All requests now route through LLM Gateway with full cost tracking. ### Test the Integration [#test-the-integration] Ask Pi to do something in your project to verify everything works: ``` > hello ``` Pi Test with LLM Gateway You should see the response streaming from your chosen model. Check your [LLM Gateway dashboard](https://llmgateway.io/dashboard) to confirm the request appears in your usage logs. ## Adding More Models [#adding-more-models] You can add any model from the [LLM Gateway models page](https://llmgateway.io/models) to your `models.json`. Just add entries to the `models` array: ```json { "providers": { "llmgateway": { "baseUrl": "https://api.llmgateway.io/v1", "api": "openai-completions", "apiKey": "llmgtwy_your_api_key_here", "models": [ { "id": "gpt-5.5", "name": "GPT-5.5" }, { "id": "gpt-5.5-mini", "name": "GPT-5.5 Mini" }, { "id": "claude-opus-4-7", "name": "Claude Opus 4.7" }, { "id": "claude-sonnet-4-6", "name": "Claude Sonnet 4.6" }, { "id": "gemini-3.1-pro", "name": "Gemini 3.1 Pro" }, { "id": "gemini-3.1-flash", "name": "Gemini 3.1 Flash" }, { "id": "deepseek-v4", "name": "DeepSeek V4", "reasoning": true }, { "id": "deepseek-v4-mini", "name": "DeepSeek V4 Mini", "reasoning": true } ] } } } ``` ## Using Environment Variables for the API Key [#using-environment-variables-for-the-api-key] Instead of hardcoding your key, you can reference an environment variable: ```json { "providers": { "llmgateway": { "baseUrl": "https://api.llmgateway.io/v1", "api": "openai-completions", "apiKey": "LLM_GATEWAY_API_KEY", "models": [{ "id": "gpt-5.5", "name": "GPT-5.5" }] } } } ``` Then set the variable in your shell profile: ```bash export LLM_GATEWAY_API_KEY=llmgtwy_your_api_key_here ``` ## Troubleshooting [#troubleshooting] ### Authentication Errors [#authentication-errors] * Verify your API key is correct in `~/.pi/agent/models.json` * Check that the base URL is set to `https://api.llmgateway.io/v1` * Ensure your LLM Gateway account has sufficient credits ### Model Not Found [#model-not-found] * Verify the model ID exists on the [models page](https://llmgateway.io/models) * Model IDs are case-sensitive — copy them exactly as shown ### Connection Issues [#connection-issues] * Check your internet connection * Ensure `api` is set to `"openai-completions"` (not `"openai-responses"`) * Monitor your usage in the LLM Gateway dashboard Need help? Join our [Discord community](https://llmgateway.io/discord) for support and troubleshooting assistance. ## Benefits of Using LLM Gateway with Pi [#benefits-of-using-llm-gateway-with-pi] * **Any Model**: Use GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro, DeepSeek V4, or 200+ others * **Cost Tracking**: Every Pi request appears in your dashboard with token counts and costs * **Caching**: Repeated requests hit cache automatically, saving money * **One Key**: Manage all providers through a single API key * **No Vendor Lock-in**: Switch models by changing one line in your config # AWS Bedrock Integration URL: https://docs.llmgateway.io/integrations/aws-bedrock AWS Bedrock is Amazon's fully managed service that provides access to foundation models from leading AI companies. This guide shows how to create AWS Bedrock Long-Term API Keys and integrate them with LLM Gateway. ## Prerequisites [#prerequisites] * An AWS account with Bedrock access enabled * LLM Gateway account or self-hosted instance ## Overview [#overview] AWS Bedrock supports **Long-Term API Keys** for simplified authentication. These keys provide direct API access without requiring IAM credentials or complex authentication flows. ## Create AWS Bedrock Long-Term API Key [#create-aws-bedrock-long-term-api-key] ### Enable Model Access in Bedrock [#enable-model-access-in-bedrock] 1. Log into the **AWS Console** 2. Navigate to **AWS Bedrock** service 3. Go to **Model access** in the left sidebar 4. Click **Manage model access** 5. Enable the models you want to use (e.g., Claude 3.5, Llama 3) 6. Wait for access to be granted (usually instant for most models) ### Create Long-Term API Key [#create-long-term-api-key] 1. In AWS Bedrock console, navigate to **API Keys** in the left sidebar 2. Click **Create Long-Term API Key** 3. Set expiry date ("Never expires" is recommended) 4. Click **Generate** 5. **Important**: Copy the API key immediately - it's only shown once! ## Add to LLM Gateway [#add-to-llm-gateway] ### Navigate to Provider Keys [#navigate-to-provider-keys] 1. Log into [LLM Gateway Dashboard](https://llmgateway.io/dashboard) 2. Select your organization and project 3. Go to **Provider Keys** in the sidebar ### Add AWS Bedrock Provider Key [#add-aws-bedrock-provider-key] 1. Click **Add** for **AWS Bedrock** 2. Paste your Long-Term API Key 3. **Select Region Prefix** based on where you want to use your models: * **us.** - For US regions (`us-east-1`, `us-west-2`) * **eu.** - For European regions (`eu-central-1`, `eu-west-1`) * **global.** - For global/cross-region endpoints 4. Click **Add Key** The system will validate your key and confirm the connection. ### Test the Integration [#test-the-integration] Test your integration with a simple API call: ```bash curl -X POST https://api.llmgateway.io/v1/chat/completions \ -H "Authorization: Bearer YOUR_LLMGATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "aws-bedrock/claude-3-5-sonnet", "messages": [ { "role": "user", "content": "Hello from AWS Bedrock!" } ] }' ``` Replace `YOUR_LLMGATEWAY_API_KEY` with your LLM Gateway API key. ## Available Models [#available-models] Once configured, you can access all AWS Bedrock models through LLM Gateway: * **Anthropic Claude**: `aws-bedrock/claude-3-5-sonnet`, `aws-bedrock/claude-3-5-haiku` * **Meta Llama**: `aws-bedrock/llama-3-2-90b`, `aws-bedrock/llama-3-2-11b` * **Amazon Titan**: `aws-bedrock/amazon.titan-text-express-v1` * **And more...** Browse all available models at [llmgateway.io/models](https://llmgateway.io/models?provider=aws-bedrock) ## Troubleshooting [#troubleshooting] ### "Model not available" error [#model-not-available-error] * Verify you've enabled model access in AWS Bedrock console * Check that the region where you created your key has access to the model * Some models are only available in specific regions ### Rate limiting [#rate-limiting] * AWS Bedrock has request quotas per model and region * Monitor usage in AWS Bedrock console * Consider requesting quota increases for high-volume workloads # Azure Integration URL: https://docs.llmgateway.io/integrations/azure Azure provides access to OpenAI's powerful language models through Microsoft's enterprise cloud infrastructure. This guide shows how to create an Azure resource, deploy models, and integrate them with LLM Gateway. Only OpenAI models are supported via Azure at this time. [Open an issue](https://github.com/theopenco/llmgateway/issues/new) to request support for other model types. ## Prerequisites [#prerequisites] * An Azure account with an active subscription * LLM Gateway account or self-hosted instance ## Overview [#overview] Azure provides enterprise-grade access to OpenAI models with enhanced security, compliance, and regional availability. LLM Gateway integrates seamlessly with Azure deployments. ## Create Azure Resource [#create-azure-resource] ### Create an Azure OpenAI Resource [#create-an-azure-openai-resource] 1. Log into the **Azure Portal** ([https://portal.azure.com](https://portal.azure.com)) 2. Click **Create a resource** 3. Search for **Azure OpenAI** and select it 4. Click **Create** 5. Configure the resource: * **Subscription**: Select your Azure subscription * **Resource group**: Create new or select existing * **Region**: Choose a region (e.g., East US, West Europe) * **Name**: Enter a unique resource name (this will be your ``) * **Pricing tier**: Select Standard S0 6. Click **Review + create**, then **Create** 7. Wait for deployment to complete **Important**: Note your resource name - it will be used in the base URL: `https://.openai.azure.com` ### Deploy Models [#deploy-models] 1. Navigate to your Azure resource in the Azure Portal 2. Click **Go to Azure OpenAI Studio** or visit [https://oai.azure.com](https://oai.azure.com) 3. In Azure Studio, select **Deployments** from the left sidebar 4. Click **Create new deployment** 5. Configure your deployment: * **Model**: Select a model (e.g., gpt-4o, gpt-4o-mini, gpt-4-turbo) * **Deployment name**: Enter a name (this must match the model identifier you'll use – use the pre-filled name) * **Model version**: Select the latest version * **Deployment type**: Global Standard 6. Click **Create** 7. Repeat for additional models you want to use **Note**: The deployment name must match the expected model name: * For `gpt-4o-mini` → deployment name should be `gpt-4o-mini` * For `gpt-35-turbo` → deployment name should be `gpt-35-turbo` etc. ### Get API Key [#get-api-key] 1. In the Azure Portal, go to your Azure resource 2. Click **Keys and Endpoint** in the left sidebar 3. Copy **Key 1** or **Key 2** 4. Note your **Endpoint** URL (should be `https://.openai.azure.com`) **Important**: Keep your API key secure - it provides access to your Azure deployments. ## Add to LLM Gateway [#add-to-llm-gateway] ### Navigate to Provider Keys [#navigate-to-provider-keys] 1. Log into [LLM Gateway Dashboard](https://llmgateway.io/dashboard) 2. Select your organization and project 3. Go to **Provider Keys** in the sidebar ### Add Azure Provider Key [#add-azure-provider-key] 1. Click **Add** for **Azure** 2. Enter your **API Key** from Azure Portal 3. Enter your **Resource Name** (the name from your Azure endpoint URL) * Example: If your endpoint is `https://my-openai-resource.openai.azure.com`, enter `my-openai-resource` 4. Select your preferred **type** (Azure OpenAI or AI Foundry) 5. Adapt the **Validation Model** to a model that you already deployed and is available This is a one time check to ensure the API key is valid and the model can be accessed. 6. Click **Add Key** The system will validate your key and confirm the connection. ### Test the Integration [#test-the-integration] Test your integration with a simple API call: ```bash curl -X POST https://api.llmgateway.io/v1/chat/completions \ -H "Authorization: Bearer YOUR_LLMGATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "azure/gpt-4o-mini", "messages": [ { "role": "user", "content": "Hello from Azure!" } ] }' ``` Replace `YOUR_LLMGATEWAY_API_KEY` with your LLM Gateway API key. ## Available Models [#available-models] Once configured, you can access your Azure deployments through LLM Gateway: * **GPT-4o**: `azure/gpt-4o` * **GPT-4o Mini**: `azure/gpt-4o-mini` * **GPT-3.5 Turbo**: `azure/gpt-3.5-turbo` (note: use gpt-3.5-turbo as llmgateway model name instead of gpt-35-turbo) **Note**: Only models you have deployed in Azure Studio will be available. Ensure your deployment names match the expected model identifiers. Browse all available models at [llmgateway.io/models](https://llmgateway.io/models?provider=azure) ## Troubleshooting [#troubleshooting] ### "Deployment not found" error [#deployment-not-found-error] * Verify you've created a deployment in Azure Studio * Ensure the deployment name exactly matches the model name you're requesting * Check that the deployment is in the same resource as your API key ### "Resource not found" error [#resource-not-found-error] * Verify the resource name is correct (check your Azure Portal endpoint URL) * Ensure your API key belongs to the correct Azure resource * Confirm the resource is in an active state in Azure Portal ### Rate limiting [#rate-limiting] * Azure has Tokens Per Minute (TPM) quotas per deployment * Monitor usage in Azure Studio under **Quotas** * Request quota increases through Azure Portal if needed for high-volume workloads ### Region availability [#region-availability] * Not all models are available in all Azure regions * Check [Azure model availability](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#model-summary-table-and-region-availability) for your region * Consider creating resources in multiple regions for better availability # Vertex AI Anthropic Integration URL: https://docs.llmgateway.io/integrations/vertex-anthropic Run Claude models (Sonnet, Opus, Haiku) on Google Cloud Vertex AI through LLM Gateway. This guide shows how to set up a GCP service account and integrate it with LLM Gateway using automatic OAuth2 token management — no manual token rotation required. ## Prerequisites [#prerequisites] * A Google Cloud project with billing enabled * LLM Gateway account or self-hosted instance ## Set up Google Cloud [#set-up-google-cloud] ### Enable the Vertex AI API [#enable-the-vertex-ai-api] In the [Google Cloud Console](https://console.cloud.google.com/apis/library/aiplatform.googleapis.com), enable the **Vertex AI API** for your project. ### Enable Claude Models in Model Garden [#enable-claude-models-in-model-garden] Navigate to **Vertex AI > Model Garden** in the Cloud Console. Search for the Claude models you want to use and click **Enable** on each one. Available models: * `claude-sonnet-4-6` * `claude-sonnet-4-5` * `claude-haiku-4-5` * `claude-opus-4-5` * `claude-opus-4-6` * `claude-opus-4-7` ### Create a Service Account [#create-a-service-account] Create a service account with the required permissions: ```bash # Create the service account gcloud iam service-accounts create vertex-ai-caller \ --display-name="Vertex AI Caller" \ --project=YOUR_PROJECT_ID # Grant the Vertex AI User role gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ --member="serviceAccount:vertex-ai-caller@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ --role="roles/aiplatform.user" ``` ### Download the Service Account Key [#download-the-service-account-key] ```bash gcloud iam service-accounts keys create service-account.json \ --iam-account=vertex-ai-caller@YOUR_PROJECT_ID.iam.gserviceaccount.com ``` Then convert it to a single-line string: ```bash cat service-account.json | tr -d '\n' ``` Keep the output handy — you'll paste it into LLM Gateway in the next steps. ## Add to LLM Gateway [#add-to-llm-gateway] ### Navigate to Provider Keys [#navigate-to-provider-keys] 1. Log into [LLM Gateway Dashboard](https://llmgateway.io/dashboard) 2. Select your organization and project 3. Go to **Provider Keys** in the sidebar ### Add Vertex Anthropic Provider Key [#add-vertex-anthropic-provider-key] 1. Click **Add** for **Vertex AI (Anthropic)** 2. Paste the single-line service account JSON as the **API Key** 3. Leave **Region** empty to use the recommended `global` endpoint, or set a specific region (e.g. `us-east5`) if you need data residency 4. Click **Add Key** The project ID is extracted automatically from the service account JSON — no separate project field is needed. ### Test the Integration [#test-the-integration] ```bash curl -X POST https://api.llmgateway.io/v1/chat/completions \ -H "Authorization: Bearer YOUR_LLMGATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "vertex-anthropic/claude-sonnet-4-6", "messages": [ { "role": "user", "content": "Hello from Vertex Anthropic!" } ] }' ``` Replace `YOUR_LLMGATEWAY_API_KEY` with your LLM Gateway API key. ## Self-Host Configuration [#self-host-configuration] If you're self-hosting LLM Gateway, configure the provider via environment variables instead of the dashboard: ```bash LLM_VERTEX_ANTHROPIC_SERVICE_ACCOUNT_JSON={"type":"service_account","project_id":"YOUR_PROJECT_ID","private_key":"-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----\n","client_email":"vertex-ai-caller@YOUR_PROJECT_ID.iam.gserviceaccount.com","token_uri":"https://oauth2.googleapis.com/token"} LLM_VERTEX_ANTHROPIC_REGION=global ``` The project ID is extracted automatically from the service account JSON — no separate `LLM_VERTEX_ANTHROPIC_PROJECT` variable is needed. ## How Token Refresh Works [#how-token-refresh-works] LLM Gateway handles the OAuth2 token lifecycle automatically: 1. On first request, the service account JSON is parsed and used to sign a JWT 2. The JWT is exchanged for an OAuth2 access token via Google's token endpoint 3. The token is cached in Redis with a **50-minute TTL** (Google tokens expire after 60 minutes) 4. An in-memory cache avoids Redis round-trips on subsequent requests 5. When the cached token expires, a new one is generated transparently This means: * No manual `gcloud auth print-access-token` commands * No cron jobs to refresh tokens * Works at any request rate (token generation happens at most once per 50 minutes) * Multi-instance deployments share the cached token via Redis ## Available Regions [#available-regions] LLM Gateway defaults to the **`global`** endpoint, which Anthropic recommends: requests are routed dynamically to whichever region has capacity, and there is no pricing premium. | Region | Notes | | ----------------- | --------------------------------------------- | | `global` | Default — dynamic routing, no pricing premium | | `us` | Multi-region (US only); 10% premium | | `eu` | Multi-region (EU only); 10% premium | | `us-east5` | Columbus, Ohio; 10% premium | | `us-central1` | Iowa; 10% premium | | `europe-west1` | Belgium; 10% premium | | `europe-west4` | Netherlands; 10% premium | | `asia-southeast1` | Singapore; 10% premium | Regional and multi-region endpoints add a 10% pricing premium on Claude Sonnet 4.5 and newer models. They are also required if you need single-region data residency or provisioned throughput. See [Anthropic's Vertex docs](https://platform.claude.com/docs/en/api/claude-on-vertex-ai#global-multi-region-and-regional-endpoints) for details. ## Available Models [#available-models] Once configured, you can access Claude models on Vertex AI through LLM Gateway: * **Sonnet**: `vertex-anthropic/claude-sonnet-4-6`, `vertex-anthropic/claude-sonnet-4-5` * **Opus**: `vertex-anthropic/claude-opus-4-7`, `vertex-anthropic/claude-opus-4-6`, `vertex-anthropic/claude-opus-4-5` * **Haiku**: `vertex-anthropic/claude-haiku-4-5` Browse all available models at [llmgateway.io/models](https://llmgateway.io/models?provider=vertex-anthropic). ## Troubleshooting [#troubleshooting] ### 401 UNAUTHENTICATED / ACCESS\_TOKEN\_TYPE\_UNSUPPORTED [#401-unauthenticated--access_token_type_unsupported] The gateway is sending an invalid token. Check: * The service account JSON is valid and complete * The service account has `roles/aiplatform.user` on the project ### 403 Permission Denied [#403-permission-denied] The service account lacks permissions. Grant the `Vertex AI User` role: ```bash gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ --member="serviceAccount:vertex-ai-caller@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ --role="roles/aiplatform.user" ``` ### Model Not Found [#model-not-found] The Claude model may not be enabled in your project's Model Garden, or may not be available in the selected region. Check the [Model Garden](https://console.cloud.google.com/vertex-ai/model-garden) in Cloud Console. # Activity URL: https://docs.llmgateway.io/learn/activity The Activity page shows a real-time log of every API request routed through LLM Gateway. Use it to debug requests, monitor performance, and track costs per call. ## Filters [#filters] Filter the activity log using the controls at the top: | Filter | Description | | --------------------------- | ------------------------------------------------------- | | **Time range** | Filter by a specific time period | | **Unified reasons** | Filter by completion reason (e.g., stop, length, error) | | **Providers** | Show requests for specific providers only | | **Models** | Show requests for specific models only | | **Custom header key/value** | Filter by custom metadata headers attached to requests | ## Activity List [#activity-list] Each activity entry shows: * **Status icon** — Green checkmark for completed, red circle for errors * **Response preview** — First line of the model's response (when available) * **Model** — The provider and model used (e.g., `google-vertex/gemini-3-pro-image-preview`) * **Cache status** — Whether the response was served from cache * **Tokens** — Total tokens consumed (input + output) * **Duration** — How long the request took * **Cost** — Inference cost for the request * **Source** — Where the request originated from * **Discount** — Any discount applied (e.g., "20% off") * **Status badge** — `completed`, `upstream_error`, `gateway_error`, etc. * **Timestamp** — Relative time (e.g., "about 4 hours ago") ### Actions per Entry [#actions-per-entry] * **Open in new tab** — View the full request detail in a new browser tab * **Expand** — Expand inline to see more details ## Activity Detail [#activity-detail] Click on any activity entry to view its full detail page. ### Summary Cards [#summary-cards] Five cards at the top provide a quick overview: | Card | Description | | ------------------ | ------------------------------- | | **Duration** | Total request time in seconds | | **Tokens** | Total tokens consumed | | **Throughput** | Tokens per second | | **Inference Cost** | Cost charged for this request | | **Cache** | Whether the response was cached | ### Request Section [#request-section] Details about the original request: * **Requested Model** — The model ID sent in the API call * **Used Model** — The actual model that served the request * **Model Mapping** — The underlying model identifier * **Provider** — The provider that handled the request * **Requested Provider** — The provider specified in the request * **Streamed** — Whether the response was streamed * **Canceled** — Whether the request was canceled * **Source** — The application or service that made the request ### Tokens Section [#tokens-section] A detailed token breakdown: * Prompt Tokens, Completion Tokens, Total Tokens * Reasoning Tokens (for reasoning models) * Image Input/Output Tokens (for vision/image models) * Response Size ### Routing Section [#routing-section] How LLM Gateway routed the request: * **Selection** — The routing strategy used (e.g., `direct-provider-specified`) * **Available** — Providers that were available for this model * **Provider Scores** — Scoring breakdown showing availability, uptime, and latency for each provider ### Parameters Section [#parameters-section] The model parameters sent with the request: * Temperature, Max Tokens, Top P * Frequency Penalty, Reasoning Effort * Response Format # Agents URL: https://docs.llmgateway.io/learn/agents The Agents page lets you monitor your AI coding agents — such as Claude Code, SoulForge, OpenCode, and others — and track their activity, costs, and token usage across sessions. ## Agent Cards [#agent-cards] Each agent is displayed as a card showing: * **Name** — The agent's identifier (e.g., SoulForge, Claude Code) * **Total cost** — Cumulative spend for this agent * **Requests** — Total number of API requests made * **Tokens** — Total tokens consumed * **Last Active** — When the agent was last used Click on any agent card to view its detailed activity. ## Agent Detail [#agent-detail] The detail view shows all sessions for a specific agent. Each session row displays: * **Time range** — When the session started and ended * **Requests** — Number of API calls in the session * **Tokens** — Total tokens consumed * **Duration** — How long the session lasted * **Cost** — Total cost for the session Expand a session to see individual requests with their response previews, model used, cache status, token counts, cost, and source. # API Keys URL: https://docs.llmgateway.io/learn/api-keys The API Keys page is the main place to create, secure, and operate the keys your apps use to authenticate with LLM Gateway. Use this page to: * Create project-specific API keys * Set all-time and recurring spend limits per key * Set an expiration (TTL) so a key disables itself automatically * Track usage for each key, including the active recurring window * Enable or disable keys without deleting them * Configure IAM rules for model, provider, and pricing access API keys are shown in full only once, immediately after creation. Copy and store them securely before closing the dialog. ## Creating an API Key [#creating-an-api-key] Click **Create API Key** and configure: * **Name**: A label such as `production`, `staging`, or `ci` * **Expiration (TTL)**: An optional time-to-live after which the key disables itself * **All-time usage limit**: An optional lifetime spend cap for the key * **Recurring usage limit**: An optional spend cap that resets on a schedule Recurring limits support: * Minimum window: **1 hour** * Maximum window: **12 months** * Units: **hour**, **day**, **week**, or **month** This is useful when you want a key to stay below a fixed budget per hour, day, week, or month, while still keeping a separate lifetime cap if needed. ## Expiration (TTL) [#expiration-ttl] Turn on **Set expiration (TTL)** when creating a key to give it a limited lifetime. Choose a value and a unit — **minutes**, **hours**, or **days** — and the key is disabled automatically once that time passes. Leave it off for a key that never expires. Expired keys show an **Expired** indicator in the list and move to the **Inactive** tab. To use one again, reactivate it and pick a **new future expiration**: * **Activate** an expired key and you'll be prompted to set a fresh TTL before it comes back online * Keys with no TTL, or whose TTL is still in the future, can be enabled and disabled without setting a new expiration This makes TTL keys ideal for temporary access — short-lived demos, CI runs, or contractor keys that should not linger. ## Usage Limits [#usage-limits] Each API key can enforce two independent limit types: | Limit Type | What it does | | ------------------------- | --------------------------------------------------------------- | | **All-time usage limit** | Stops the key after it reaches a lifetime spend threshold | | **Recurring usage limit** | Stops the key after it reaches the budget for the active window | Examples: * `$50` all-time for a temporary integration key * `$10 / 1 day` for a development key * `$500 / 1 month` for a production service key If a key hits either limit, requests using that key are rejected until the key is updated or, for recurring limits, the next window begins. ### How recurring windows work [#how-recurring-windows-work] Recurring usage is tracked separately from total lifetime usage. * The dashboard shows the key's **Current Period** usage * The active window also shows when it **resets** * When the configured window expires, usage for that window resets automatically * Updating the recurring limit configuration resets the current window and starts a new one Usage includes both LLM Gateway credits and requests routed through your own provider keys when applicable. ## API Keys List [#api-keys-list] Each key in the list shows: | Field | Description | | ------------------ | ------------------------------------------------------------- | | **Name** | The label you assigned to the key | | **API Key** | A masked preview of the key | | **Status** | Whether the key is active or inactive, plus its expiry if set | | **Created** | When the key was created | | **Usage** | Total tracked usage for the key | | **Current Period** | Spend in the active recurring window, if configured | | **Limits** | All-time and recurring limit summary | | **IAM Rules** | Whether model/provider/pricing access controls are configured | ## Actions [#actions] For each API key you can: * **Update limits**: Change all-time or recurring limits * **Disable or enable**: Pause usage without deleting the key (reactivating an expired key prompts for a new expiration) * **Configure IAM rules**: Restrict which models, providers, or pricing tiers the key can use * **Open usage details**: Inspect requests and usage tied to that key * **Delete**: Permanently remove the key ## IAM Rules [#iam-rules] IAM rules let you narrow what an API key is allowed to access. Supported rule types include: * **Allow/Deny models** * **Allow/Deny providers** * **Allow/Deny pricing** Use IAM rules when you want a key to be valid, but only for a specific subset of models or providers. For a deeper explanation, see the [API Keys & IAM Rules feature page](/features/api-keys). ## Plan Limits [#plan-limits] The page also shows how many API keys your current project is using relative to your plan allowance. * **Free**: Standard API key count limit * **Enterprise**: Custom limits If you reach the project key limit, the **Create API Key** button is disabled until you delete unused keys or upgrade. # Audit Logs URL: https://docs.llmgateway.io/learn/audit-logs The Audit Logs page provides a complete history of all actions performed within your organization, essential for compliance and security monitoring. Audit Logs are available on the [**Enterprise plan**](https://llmgateway.io/enterprise). Owner or Admin role is required. ## Filters [#filters] Narrow down the log entries: * **Action** — Filter by action type (create, delete, update, etc.) * **Resource type** — Filter by resource (API, IAM, API Keys, etc.) Both filters are populated dynamically based on the actions recorded in your organization. ## Audit Log Entries [#audit-log-entries] Each log entry shows: | Field | Description | | ----------------- | ------------------------------------------------------------ | | **Timestamp** | Exact time of the action (formatted as MMM d, yyyy HH:mm:ss) | | **User** | Name and email of the person who performed the action | | **Action** | What was done (e.g., "API Keys → create") | | **Resource type** | The type of resource affected (shown as a badge) | | **Resource ID** | Identifier of the affected resource (with copy button) | | **Details** | Additional metadata about the action | ## Pagination [#pagination] The log supports infinite scrolling with a **Load More** button to view older entries. Entries are sorted newest first. # Billing URL: https://docs.llmgateway.io/learn/billing The Billing page is your central hub for managing credits, plans, and payment methods. ## Credits [#credits] Displays your current credit balance. Credits are consumed as you make API requests through the gateway. Click **Top Up Credits** to add more credits to your account. ## Fees [#fees] Top-ups are charged the credit amount plus the following fees: * **Platform fee** — A flat 5% fee applied to every credit purchase. * **International card fee** — An additional 1.5% fee applied when paying with a non-US issued card. This covers the higher processing cost charged by the card network for international transactions. Cards issued in the United States are not subject to this fee. The full breakdown (credits, platform fee, and — when applicable — the international card fee) is shown in the top-up dialog before you confirm payment, so the total charge is always transparent. ## Plan Management [#plan-management] View and manage your subscription: * See your current plan (Free or Enterprise) * Billing cycle information * Click **Manage Subscription** to upgrade, downgrade, or cancel ## Payment Methods [#payment-methods] Manage your saved payment methods: * Add a new credit card or payment method * View existing payment methods * Update billing information ## Auto Top-up Settings [#auto-top-up-settings] Configure automatic credit top-ups so you never run out: * **Enable/disable** auto top-up * **Threshold** — The credit balance that triggers a top-up * **Amount** — How many credits to add when the threshold is reached This ensures uninterrupted service by automatically replenishing your credits when they run low. # Chat Plans URL: https://docs.llmgateway.io/learn/chat-plans Chat Plans are optional monthly subscriptions for the chat playground. Instead of paying per request from your pay-as-you-go balance, a Chat Plan gives you a pool of monthly credits worth more than you pay — so heavy chat usage costs less. ## Plans [#plans] There are three tiers, billed monthly: | Plan | Price | Monthly value | Models | | ----------- | ------ | ---------------- | ---------------------------------------------------------------------------- | | **Starter** | $9/mo | \~2× the value | Most chat models — Claude Haiku & Sonnet, GPT-5-mini, Gemini Flash, and more | | **Plus** | $19/mo | \~2.5× the value | Everything in Starter **plus** frontier models | | **Pro** | $49/mo | \~3× the value | All models, highest monthly allowance | The credit multiplier is tapered: the larger the plan, the more usage value each dollar buys at provider rates. **Frontier models** — flagship models such as Claude Opus, GPT-5, Gemini 2.5 Pro, and Grok 4 are included on **Plus** and **Pro**. The Starter plan covers the broad catalog of everyday chat models but does not include these frontier models. ## How credits work [#how-credits-work] * **Monthly reset** — Your plan credits refresh at the start of each billing cycle. Unused credits do **not** roll over to the next month. * **Plan credits drain first** — Requests made from the chat app draw down your plan's monthly credits before anything else. * **Pay-as-you-go fallback** — Once your monthly credits are used up, the chat app falls back to your regular pay-as-you-go balance, which never expires. You can keep chatting without interruption. ## Managing your plan [#managing-your-plan] * Open the **Pricing** page from the chat playground sidebar to compare tiers and subscribe. * Your active plan appears in the playground sidebar with a badge, alongside how many credits remain for the cycle. * You can upgrade, downgrade, or cancel at any time. Cancelling takes effect at the end of the period you've already paid for — you keep access until then. # Dashboard URL: https://docs.llmgateway.io/learn/dashboard The Dashboard is the first page you see after logging in. It provides a high-level overview of your project's LLM usage, costs, and performance at a glance. ## Date Range [#date-range] At the top of the page, you can toggle the date range for all dashboard metrics: * **7 days** — Last 7 days of data (default) * **30 days** — Last 30 days of data * **Custom** — Pick a custom start and end date ## Stat Cards [#stat-cards] The dashboard displays eight metric cards in two rows: ### Top Row [#top-row] | Card | Description | | ------------------------ | ------------------------------------------------------------------------ | | **Organization Credits** | Your current available credit balance | | **Total Requests** | Number of API requests in the selected period, with cache hit percentage | | **Total Cost** | Total inference cost for the period, including storage costs | | **Total Savings** | Savings from discounts during the selected period | ### Bottom Row [#bottom-row] | Card | Description | | ------------------------ | ------------------------------------------------------------------- | | **Input Tokens & Cost** | Total prompt tokens sent and their associated cost | | **Output Tokens & Cost** | Total completion tokens received and their associated cost | | **Cached Tokens & Cost** | Tokens served from cache (if caching is enabled) and the cost saved | | **Most Used Model** | The model with the highest request count, along with its provider | ## Usage Overview Chart [#usage-overview-chart] Below the stat cards, a chart visualizes your usage over time. You can toggle between two views using the dropdown: * **Costs** — Shows input, output, and cached input costs as a stacked area chart * **Requests** — Shows request volume over time The chart is filtered by the currently selected project. ## Quick Actions [#quick-actions] A sidebar panel provides shortcuts to common tasks: * **Manage API Keys** — Go to the API Keys page * **Provider Keys** — Configure your own provider keys * **View Activity** — See detailed request logs * **Usage & Metrics** — Dive into usage analytics * **Model Usage** — View per-model usage breakdown ## Cost Breakdown [#cost-breakdown] A donut chart showing how your costs are distributed across different models and providers. Each segment is color-coded and labeled with the model name and cost, making it easy to identify your biggest cost drivers. ## Errors & Reliability [#errors--reliability] Displays two key reliability metrics: * **Error Rate** — Percentage of failed requests over the selected period * **Uptime** — Gateway availability percentage ## Recent Activity [#recent-activity] A table showing your most recent API requests with key details like model, status, tokens, duration, and cost. Click any entry to view the full request detail. ## Header Actions [#header-actions] Two buttons in the top-right corner: * **Create API Key** — Quickly create a new API key for your project * **Top Up Credits** — Add credits to your organization balance # Guardrails URL: https://docs.llmgateway.io/learn/guardrails The Guardrails page lets you configure content safety rules that automatically scan and filter API requests before they reach the LLM provider. Guardrails are available on the [**Enterprise plan**](https://llmgateway.io/enterprise). Owner or Admin role is required. ## Main Toggle [#main-toggle] A global toggle at the top enables or disables all guardrails for your organization. Click **Save Changes** to apply. ## System Rules [#system-rules] Six built-in rules with individual enable/disable toggles: | Rule | Description | | ------------------------------- | -------------------------------------------------------------------- | | **Prompt Injection Detection** | Detects attempts to override or manipulate system instructions | | **Jailbreak Prevention** | Identifies attempts to bypass safety measures | | **PII Detection** | Identifies personal information like emails, phone numbers, and SSNs | | **Secrets Detection** | Detects API keys, passwords, and credentials | | **File Type Restrictions** | Controls which file types can be uploaded | | **Document Leakage Prevention** | Detects attempts to extract confidential documents | Each rule has an action dropdown to configure the response: * **Block** — Reject the request entirely * **Redact** — Remove or mask sensitive content, then continue * **Warn** — Log the violation but allow the request ## File Restrictions [#file-restrictions] Configure file upload limits: * **Max file size** — Set the maximum file size in MB * **Allowed file types** — Add or remove permitted MIME types ## Custom Rules [#custom-rules] Create organization-specific rules by clicking **Add Rule**: * **Blocked Terms** — Block specific words or phrases * **Custom Regex** — Match patterns with regular expressions * **Topic Restriction** — Restrict content related to specific topics Each custom rule can be individually enabled/disabled or deleted. Learn more about guardrails in the [Guardrails feature docs](/features/guardrails). # Introduction URL: https://docs.llmgateway.io/learn The LLM Gateway dashboard gives you full control over your LLM API usage, costs, and configuration. This section walks you through every page in the dashboard so you can get the most out of the platform. ## Project Pages [#project-pages] These pages are scoped to a specific project within your organization: * [**Dashboard**](/learn/dashboard) — Overview of your usage, costs, and performance * [**Activity**](/learn/activity) — Detailed logs of every API request * [**Agents**](/learn/agents) — Monitor your AI coding agents and their activity * [**Model Usage**](/learn/model-usage) — Usage breakdown by model * [**Model Categories & Fair Use**](/learn/model-categories) — How models are categorized and premium fair-use caps * [**Usage & Metrics**](/learn/usage-metrics) — Requests, errors, cache rates, and cost trends * [**API Keys**](/learn/api-keys) — Create and manage your API keys * [**Preferences**](/learn/preferences) — Project-level settings like caching and mode * [**LLM SDK**](/learn/sdk-settings) — Embed AI and credit purchases into your own app ## Organization Pages [#organization-pages] These pages apply to your entire organization: * [**Provider Keys**](/learn/provider-keys) — Bring your own provider API keys * [**Guardrails**](/learn/guardrails) — Content safety rules and filters * [**Security Events**](/learn/security-events) — Monitor guardrail violations * [**Billing**](/learn/billing) — Credits, plans, and payment methods * [**Transactions**](/learn/transactions) — Payment and credit history * [**Referrals**](/learn/referrals) — Earn credits by referring others * [**Policies**](/learn/policies) — Data retention configuration * [**Org Preferences**](/learn/org-preferences) — Organization name and billing details * [**Team**](/learn/team) — Manage team members and roles * [**Audit Logs**](/learn/audit-logs) — Complete history of organization actions ## Playground [#playground] Interactive tools for testing and experimenting with LLM models: * [**Chat Playground**](/learn/playground) — Test models with an interactive chat interface * [**Group Chat**](/learn/playground-group) — Watch multiple models discuss and collaborate on your prompt * [**Image Studio**](/learn/playground-image) — Generate images using AI models * [**Video Studio**](/learn/playground-video) — Generate videos using AI models * [**Chat Plans**](/learn/chat-plans) — Monthly subscription plans for the chat playground # Model Categories & Fair Use URL: https://docs.llmgateway.io/learn/model-categories Every model in the gateway is sorted into a category. Categories power dashboard filtering, analytics, and — for DevPass coding plans — the fair-use limits that keep flagship models available to everyone. ## Categories [#categories] | Category | Description | | ------------ | ---------------------------------------------------------------------------------------------------------------------------- | | **Premium** | High-cost frontier / flagship models — priced at **$15+ per million output tokens** or **$5+ per million input tokens** | | **Standard** | Every other model — the broad catalog of fast, cost-effective everyday models | You can browse the full catalog on the [**Supported Models**](https://llmgateway.io/models) page and filter by use case, capabilities, provider, price, and context size. ## Fair-use caps on premium models (DevPass only) [#fair-use-caps-on-premium-models-devpass-only] Fair-use caps apply **only to DevPass** — the fixed-price monthly plans for coding tools (Lite, Pro, Max). They do **not** apply to the LLM Gateway API or pay-as-you-go credits: when you call the API directly, premium models are limited only by your credit balance, with no weekly cap. Premium models are the most expensive to run, so DevPass plans apply a **weekly fair-use cap** on premium usage. This is a rolling 7-day window that resets continuously — it sits on top of the plan's normal monthly credit allowance. | DevPass plan | Premium fair-use cap | | ------------ | -------------------- | | **Lite** | 10 credits / week | | **Pro** | 50 credits / week | | **Max** | 140 credits / week | Within DevPass, the weekly cap applies only to **premium** models. Standard models are limited only by the plan's credit balance, not by the fair-use window. Once a DevPass plan reaches its weekly premium cap, premium requests are paused until the rolling window frees up, while standard models keep working normally. Upgrading the DevPass plan raises the weekly cap. # Model Usage URL: https://docs.llmgateway.io/learn/model-usage The Model Usage page shows how your API requests are distributed across different LLM models over time. ## Filters [#filters] Two filters let you narrow down the data: * **API Key** — Select a specific API key or view usage across all keys * **Date range** — Choose a time period to analyze ## Usage Chart [#usage-chart] The main chart displays a time-series breakdown of requests per model. Each model is represented by a different color, making it easy to see: * Which models are used most frequently * How usage patterns change over time * Whether usage is concentrated on a single model or spread across many This page is useful for understanding your model distribution and identifying opportunities to optimize costs by switching to more cost-effective models for certain workloads. # Org Preferences URL: https://docs.llmgateway.io/learn/org-preferences The Org Preferences page contains settings for your organization's identity and billing information. ## Organization Name [#organization-name] Update your organization's display name. This name appears throughout the dashboard and in billing communications. ## Billing Email [#billing-email] Set or update the email address used for billing-related communications, including receipts, invoices, and payment notifications. ## Billing Information [#billing-information] Configure your organization's billing details for invoices: | Field | Description | | ---------------------------------- | ------------------------------------------------------------------------ | | **Email Address** | Primary email for billing communications | | **Company Name** (optional) | Your company or organization name for invoices | | **Billing Address** | Street address, city, state/province, ZIP code, and country | | **Tax ID / VAT Number** (optional) | Your tax identification or VAT number for tax-compliant invoices | | **Invoice Notes** (optional) | Custom notes to include on invoices (e.g., PO numbers, department codes) | # Group Chat URL: https://docs.llmgateway.io/learn/playground-group The Group Chat page lets you add multiple AI models to a conversation where they discuss and build on each other's responses, creating a dynamic multi-model dialogue. ## How It Works [#how-it-works] 1. Add 2–5 different AI models to the conversation 2. Enter an initial prompt or question to kick off the discussion 3. Click **Start Conversation** to begin 4. Models take turns responding to each other in sequence 5. Each model builds on the previous responses, creating a dynamic conversation 6. You can stop the conversation at any time and start a new one ## Use Cases [#use-cases] * **Model evaluation** — Compare how different models approach the same topic * **Brainstorming** — Get diverse perspectives from multiple AI models * **Debate** — Watch models discuss pros and cons of a topic * **Research** — Gather multi-model analysis of complex questions # Image Studio URL: https://docs.llmgateway.io/learn/playground-image The Image Studio lets you generate images using AI models through an intuitive interface. Select a model, describe what you want, and get results instantly. ## Model Selection [#model-selection] Choose from supported image generation models in the dropdown. Each model has different capabilities, resolutions, and pricing. ## Generating Images [#generating-images] 1. Select an image generation model 2. Type a description of the image you want 3. Click send to generate 4. Generated images appear in the conversation ## Image Count [#image-count] You can generate 1, 2, or 4 images at once. Multiple images are displayed in a grid layout. ## Resolution Options [#resolution-options] Available resolutions depend on the selected model. Common options include 1K, 2K, and 4K. # Video Studio URL: https://docs.llmgateway.io/learn/playground-video The Video Studio lets you generate videos using AI models. Select a model, describe what you want, and get video results. ## Model Selection [#model-selection] Choose from supported video generation models in the dropdown. Each model has different capabilities, resolutions, and pricing. ## Generating Videos [#generating-videos] 1. Select a video generation model 2. Type a description of the video you want 3. Click send to generate 4. Generated videos appear in the conversation ## Resolution Options [#resolution-options] Available resolutions depend on the selected model. # Chat Playground URL: https://docs.llmgateway.io/learn/playground The Chat Playground is a standalone app for testing LLM models through a conversational interface. You can select any supported model, adjust parameters, and see responses in real time. ## Model Selection [#model-selection] Use the dropdown at the top to pick a model and provider. The **Auto Route** option automatically selects the best provider based on availability and cost. ## Chat Interface [#chat-interface] * Type your message in the input field at the bottom * Click the send button or press Enter to submit * Responses stream in real time * Previous conversations appear in the sidebar ## Prompt Suggestions [#prompt-suggestions] When starting a new chat, category tabs help you pick a prompt: * **Create** — Content generation prompts * **Explore** — Research and analysis prompts * **Code** — Programming and development prompts * **Image gen** — Image generation prompts ## Sidebar [#sidebar] The left sidebar shows your chat history. Click **+ New Chat** to start a fresh conversation, or select a previous chat to continue it. ## Comparison Mode [#comparison-mode] Toggle **Comparison mode** in the top-right to send the same prompt to multiple models side by side. See the [Group Chat](/learn/playground-group) page for details. ## Image Studio [#image-studio] Click **Image Studio** in the sidebar to switch to the image generation interface. See the [Image Studio](/learn/playground-image) page for details. # Policies URL: https://docs.llmgateway.io/learn/policies The Policies page lets you configure organization-wide policies that govern how your data is handled. ## Data Retention [#data-retention] Control how long your request logs and activity data are stored. The retention period depends on your plan: | Plan | Retention Period | | -------------- | ---------------- | | **Free** | 30 days | | **Enterprise** | Custom | After the retention period expires, request logs and associated data are automatically deleted. Learn more about data retention in the [Data Retention feature docs](/features/data-retention). # Preferences URL: https://docs.llmgateway.io/learn/preferences The Preferences page contains project-level settings that control how your project behaves. ## Project Name [#project-name] Update the display name for your project. This name appears in the sidebar and throughout the dashboard. ## Project Mode [#project-mode] Configure how your organization handles projects. This setting determines the routing and isolation behavior for API requests within the project. ## Caching [#caching] Enable or configure response caching for API requests. When enabled, identical requests will return cached responses instead of making new calls to the provider, saving both time and cost. Learn more about caching in the [Caching feature docs](/features/caching). ## Danger Zone [#danger-zone] The Danger Zone section contains irreversible actions: * **Archive Project** — Permanently archive the project. This action cannot be undone. Archived projects stop processing requests and their API keys become inactive. # Provider Keys URL: https://docs.llmgateway.io/learn/provider-keys The Provider Keys page lets you add your own API keys from LLM providers (OpenAI, Anthropic, Google, etc.) to route requests directly through your accounts without additional gateway fees. ## Adding a Provider Key [#adding-a-provider-key] Click **Add Provider Key** to configure a new key: * **Provider** — Select which provider this key belongs to * **Custom name** — An optional label to identify the key * **API key** — Your provider's API key * **Base URL** — Optional custom endpoint (useful for Azure OpenAI or custom deployments) ## Provider Keys List [#provider-keys-list] Each configured key shows: | Field | Description | | --------------- | -------------------------------------------------- | | **Provider** | The LLM provider (e.g., OpenAI, Anthropic) | | **Custom name** | Your label for the key | | **Status** | Active, inactive, or deleted | | **Base URL** | Custom endpoint if configured | | **Token** | Masked key with only the last 4 characters visible | ## Actions [#actions] For each provider key: * **Edit** — Update the key name, value, or base URL * **Deactivate** — Temporarily disable the key without deleting it * **Delete** — Permanently remove the key When you use your own provider keys, requests are routed directly to the provider. You are only charged the provider's standard rates with no additional gateway markup. # Referrals URL: https://docs.llmgateway.io/learn/referrals The Referrals page lets you earn credits by inviting others to use LLM Gateway. ## Eligibility [#eligibility] To unlock the referral program, your organization must have at least **$100 in total credit top-ups**. Before reaching this threshold, the page shows: * A progress bar showing your progress toward $100 * The remaining amount needed to unlock * An explanation of the 1% earnings model ## Referral Dashboard [#referral-dashboard] Once eligible, the page shows: ### Your Referral Link [#your-referral-link] A unique shareable link tied to your organization. Click the copy button to copy it to your clipboard and share it with others. ### Your Stats [#your-stats] | Stat | Description | | ------------------ | ----------------------------------------------------- | | **Users Referred** | Total number of users who signed up through your link | | **Total Earnings** | Total credit amount earned from referrals | ### How It Works [#how-it-works] 1. **Share Your Link** — Send your referral link to others 2. **They Sign Up** — They create an LLM Gateway account using your link 3. **Earn Credits** — You earn 1% of their spending as credits Credits are automatically added to your organization balance. # LLM SDK URL: https://docs.llmgateway.io/learn/sdk-settings The **LLM SDK** settings page lets you embed AI and in-app credit purchases into your own application — your end users get their own wallets, and you control markup and access. You'll find it under **Settings → SDK** for a project. ## End-user sessions [#end-user-sessions] Turn on **Enable end-user sessions** to allow this project to mint short-lived browser session tokens for your users. | Field | Description | | ------------------- | -------------------------------------------------------------------------------------------------- | | **Markup percent** | The percentage you add on top of provider cost for each end-user request (0–100%) | | **Allowed origins** | The browser origins permitted to use session tokens, one per line (e.g. `https://app.example.com`) | Click **Save Settings** to apply changes. ## Platform secret keys [#platform-secret-keys] Platform secret keys are **server-side** keys used to mint end-user sessions. Keep them on your backend — never expose them in the browser. * **Create Live Key** — A production key. Top-ups made with it use live billing. * **Create Test Key** — A sandbox key. Top-ups use the Stripe sandbox, so you can build and test without real charges. A secret key is shown **only once** at creation time. Copy it immediately — it won't be displayed again. If you lose a key, revoke it and create a new one. Each key in the list shows its description, a **test** badge when applicable, its status, and a masked token. Use **Revoke** to permanently disable a key. For the full SDK integration guide — server, client, and React components — see the [LLM SDK feature docs](/features/llm-sdk). # Security Events URL: https://docs.llmgateway.io/learn/security-events The Security Events page shows all guardrail violations detected across your organization, helping you monitor content safety and policy enforcement. Security Events are available on the [**Enterprise plan**](https://llmgateway.io/enterprise). Owner or Admin role is required. ## Stats Cards [#stats-cards] Four summary cards at the top: | Card | Description | | -------------------- | --------------------------------------------- | | **Total Violations** | All-time violation count | | **Last 24 Hours** | Violations in the past day | | **Blocked** | Number of requests that were blocked | | **Redacted** | Number of requests where content was redacted | ## Filters [#filters] Narrow down the events list: * **Action** — Filter by Blocked, Redacted, Warned, or All actions * **Category** — Filter by Prompt Injection, Jailbreak, PII Detection, Secrets, Blocked Terms, Custom Regex, or Topic Restriction ## Violations List [#violations-list] Each violation entry shows: | Field | Description | | ------------------- | ---------------------------------------------------- | | **Timestamp** | When the violation occurred | | **Rule name** | Which guardrail rule was triggered | | **Category** | The type of violation (shown as a badge) | | **Action** | What action was taken (Blocked, Redacted, or Warned) | | **Matched pattern** | The content that triggered the rule | The list supports pagination with a **Load More** button for viewing older events. # Team URL: https://docs.llmgateway.io/learn/team The Team page lets you invite team members, assign roles, and control access to your organization. ## Adding Members [#adding-members] Click **Add Member** to invite someone by email. You'll need to: 1. Enter their email address 2. Select a role (Developer, Admin, or Owner) Your plan includes up to **5 team seats**. The current count is displayed, and the Add button is disabled when all seats are used. Contact sales for additional seats. ## Team Members List [#team-members-list] Each member shows: | Field | Description | | --------- | ------------------------------------------------ | | **Name** | The member's display name | | **Email** | Their email address | | **Role** | Their current role (can be changed via dropdown) | ## Actions [#actions] * **Update role** — Change a member's role using the dropdown * **Remove** — Remove a member from the organization (requires confirmation) ## Role Permissions [#role-permissions] | Role | Permissions | | ------------- | ----------------------------------------------------------------------------------------------------- | | **Owner** | Full access to all settings, billing, team management, and all projects | | **Admin** | Can manage team members, projects, and API keys, but cannot access billing or delete the organization | | **Developer** | View and use resources only. Cannot modify settings or manage team | Developers can also be given **restricted access** at the API key level, limiting which keys they can view and use. # Transactions URL: https://docs.llmgateway.io/learn/transactions The Transactions page shows a complete history of all financial transactions in your organization. ## Transaction History [#transaction-history] Each transaction entry includes: | Field | Description | | --------------- | ---------------------------------------- | | **Date** | When the transaction occurred | | **Type** | The transaction type (see below) | | **Credits** | Number of credits added or deducted | | **Total Paid** | The dollar amount charged | | **Status** | Current state of the transaction | | **Description** | Additional details about the transaction | ## Transaction Types [#transaction-types] | Type | Description | | ----------------------- | ----------------------------------- | | **Credit Top-up** | Manual or automatic credit purchase | | **Credit Refund** | Credits refunded to your account | | **Subscription Start** | New plan subscription started | | **Subscription Cancel** | Plan subscription canceled | | **Subscription End** | Plan subscription period ended | ## Status Badges [#status-badges] * **Completed** — Transaction processed successfully * **Pending** — Transaction is being processed * **Failed** — Transaction could not be completed # Usage & Metrics URL: https://docs.llmgateway.io/learn/usage-metrics The Usage & Metrics page provides comprehensive analytics through five tabs, giving you deep insight into your LLM API usage patterns. ## Filters [#filters] * **API Key** — Filter metrics by a specific API key or view all * **Date range** — Select the time period (defaults to last 7 days) ## Tabs [#tabs] ### Requests [#requests] A time-series chart showing request volume over the selected period. Use this to identify traffic patterns, peak usage times, and growth trends. ### Models [#models] A table showing your top-used models ranked by request count. For each model you can see: * Total requests * Token consumption * Associated costs This helps you understand which models drive the most usage and cost. ### Errors [#errors] A chart showing error rates over time. Track: * Error frequency and trends * Spikes that may indicate provider issues * Overall reliability of your API calls ### Cache [#cache] A chart showing your cache hit rate over time. Monitor: * How effectively caching is reducing redundant requests * Cache hit vs. miss ratios * The cost savings from cached responses ### Costs [#costs] A cost breakdown chart showing spending patterns. Analyze: * Cost trends over time * Cost distribution by provider or model * Opportunities to reduce spending # Migrate from LiteLLM URL: https://docs.llmgateway.io/migrations/litellm
Running your own LiteLLM proxy works—until it doesn't. Scaling, monitoring, and keeping it running becomes another job. LLM Gateway gives you the same unified API with built-in analytics, caching, and a dashboard—without the infrastructure overhead. ## Quick Migration [#quick-migration] Both services use OpenAI-compatible endpoints, so migration is a two-line change: ```diff - const baseURL = "http://localhost:4000/v1"; // LiteLLM proxy + const baseURL = "https://api.llmgateway.io/v1"; - const apiKey = process.env.LITELLM_API_KEY; + const apiKey = process.env.LLM_GATEWAY_API_KEY; ```
## Why Teams Switch to LLM Gateway [#why-teams-switch-to-llm-gateway] | What You Get | LiteLLM (Self-Hosted) | LLM Gateway | | ------------------------ | --------------------- | ---------------------- | | OpenAI-compatible API | Yes | Yes | | Infrastructure to manage | Yes (you run it) | No (we run it) | | Managed cloud option | No | Yes | | Analytics dashboard | Basic | Per-request detail | | Response caching | Manual setup | Built-in, automatic | | Cost tracking | Via callbacks | Native, real-time | | Provider key management | Config file | Web UI with rotation | | Uptime & scaling | You handle it | 99.9% SLA (Enterprise) | Still want to self-host? LLM Gateway is [open source under AGPLv3](https://llmgateway.io/blog/how-to-self-host-llm-gateway)—same features, your infrastructure. For a detailed breakdown, see [LLM Gateway vs LiteLLM](https://llmgateway.io/compare/litellm).
## Migration Steps [#migration-steps] ### Get Your LLM Gateway API Key [#get-your-llm-gateway-api-key] Sign up at [llmgateway.io/signup](https://llmgateway.io/signup) and create an API key from your dashboard. ### Map Your Models [#map-your-models] LLM Gateway supports two model ID formats: **Root Model IDs** (without provider prefix) - Uses smart routing to automatically select the best provider based on uptime, throughput, price, and latency: ``` gpt-5.2 claude-opus-4-5-20251101 gemini-3-flash-preview ``` **Provider-Prefixed Model IDs** - Routes to a specific provider with automatic failover if uptime drops below 90%: ``` openai/gpt-5.2 anthropic/claude-opus-4-5-20251101 google-ai-studio/gemini-3-flash-preview ``` This means many LiteLLM model names work directly with LLM Gateway: | LiteLLM Model | LLM Gateway Model | | -------------------------------- | ----------------------------------------------------------------- | | gpt-5.2 | gpt-5.2 or openai/gpt-5.2 | | claude-opus-4-5-20251101 | claude-opus-4-5-20251101 or anthropic/claude-opus-4-5-20251101 | | gemini/gemini-3-flash-preview | gemini-3-flash-preview or google-ai-studio/gemini-3-flash-preview | | bedrock/claude-opus-4-5-20251101 | claude-opus-4-5-20251101 or aws-bedrock/claude-opus-4-5-20251101 | For more details on routing behavior, see the [routing documentation](/features/routing). ### Update Your Code [#update-your-code] #### Python with OpenAI SDK [#python-with-openai-sdk] ```python from openai import OpenAI # Before (LiteLLM proxy) client = OpenAI( base_url="http://localhost:4000/v1", api_key=os.environ["LITELLM_API_KEY"] ) response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "Hello!"}] ) # After (LLM Gateway) - model name can stay the same! client = OpenAI( base_url="https://api.llmgateway.io/v1", api_key=os.environ["LLM_GATEWAY_API_KEY"] ) response = client.chat.completions.create( model="gpt-4", # or "openai/gpt-4" to target a specific provider messages=[{"role": "user", "content": "Hello!"}] ) ``` #### Python with LiteLLM Library [#python-with-litellm-library] If you're using the LiteLLM library directly, you can point it to LLM Gateway: ```python import litellm # Before (direct LiteLLM) response = litellm.completion( model="gpt-4", messages=[{"role": "user", "content": "Hello!"}] ) # After (via LLM Gateway) - same model name works response = litellm.completion( model="gpt-4", # or "openai/gpt-4" to target a specific provider messages=[{"role": "user", "content": "Hello!"}], api_base="https://api.llmgateway.io/v1", api_key=os.environ["LLM_GATEWAY_API_KEY"] ) ``` #### TypeScript/JavaScript [#typescriptjavascript] ```typescript import OpenAI from "openai"; // Before (LiteLLM proxy) const client = new OpenAI({ baseURL: "http://localhost:4000/v1", apiKey: process.env.LITELLM_API_KEY, }); // After (LLM Gateway) - same model name works const client = new OpenAI({ baseURL: "https://api.llmgateway.io/v1", apiKey: process.env.LLM_GATEWAY_API_KEY, }); const completion = await client.chat.completions.create({ model: "gpt-4", // or "openai/gpt-4" to target a specific provider messages: [{ role: "user", content: "Hello!" }], }); ``` #### cURL [#curl] ```bash # Before (LiteLLM proxy) curl http://localhost:4000/v1/chat/completions \ -H "Authorization: Bearer $LITELLM_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}] }' # After (LLM Gateway) - same model name works curl https://api.llmgateway.io/v1/chat/completions \ -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}] }' # Use "openai/gpt-4" to target a specific provider ``` ### Migrate Configuration [#migrate-configuration] #### LiteLLM Config (Before) [#litellm-config-before] ```yaml # litellm_config.yaml model_list: - model_name: gpt-4 litellm_params: model: gpt-4 api_key: sk-... - model_name: claude-3 litellm_params: model: claude-3-sonnet-20240229 api_key: sk-ant-... ``` #### LLM Gateway (After) [#llm-gateway-after] With LLM Gateway, you don't need a config file. Provider keys are managed in the web dashboard, or you can use the default LLM Gateway keys. If you want to use your own provider keys, configure them in the dashboard under Settings > Provider Keys.
## Streaming Support [#streaming-support] LLM Gateway supports streaming identically to LiteLLM: ```python from openai import OpenAI client = OpenAI( base_url="https://api.llmgateway.io/v1", api_key=os.environ["LLM_GATEWAY_API_KEY"] ) stream = client.chat.completions.create( model="openai/gpt-4", messages=[{"role": "user", "content": "Write a story"}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") ```
## Function/Tool Calling [#functiontool-calling] LLM Gateway supports function calling: ```python from openai import OpenAI client = OpenAI( base_url="https://api.llmgateway.io/v1", api_key=os.environ["LLM_GATEWAY_API_KEY"] ) tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get the weather for a location", "parameters": { "type": "object", "properties": { "location": {"type": "string"} }, "required": ["location"] } } }] response = client.chat.completions.create( model="openai/gpt-4", messages=[{"role": "user", "content": "What's the weather in Tokyo?"}], tools=tools ) ```
## Removing LiteLLM Infrastructure [#removing-litellm-infrastructure] After verifying LLM Gateway works for your use case, you can decommission your LiteLLM proxy: 1. Update all clients to use LLM Gateway endpoints 2. Monitor the LLM Gateway dashboard for successful requests 3. Shut down your LiteLLM proxy server 4. Remove LiteLLM configuration files ## What Changes After Migration [#what-changes-after-migration] * **No servers to babysit** — We handle scaling, uptime, and updates * **Real-time cost visibility** — See what every request costs, broken down by model * **Automatic caching** — Repeated requests hit cache, reducing your spend * **Web-based management** — No more editing YAML files for config changes * **New models immediately** — Access new releases within 48 hours, no deployment needed
## Self-Hosting LLM Gateway [#self-hosting-llm-gateway] If you prefer self-hosting like LiteLLM, LLM Gateway is available under AGPLv3: ```bash git clone https://github.com/llmgateway/llmgateway cd llmgateway pnpm install pnpm setup pnpm dev ``` This gives you the same benefits as LiteLLM's self-hosted proxy with LLM Gateway's analytics and caching features. ## Full Comparison [#full-comparison] Want to see a detailed breakdown of all features? Check out our [LLM Gateway vs LiteLLM comparison page](https://llmgateway.io/compare/litellm).
## Need Help? [#need-help] * Browse available models at [llmgateway.io/models](https://llmgateway.io/models) * Read the [API documentation](https://docs.llmgateway.io) * Contact support at [contact@llmgateway.io](mailto:contact@llmgateway.io)
# Migrate from OpenRouter URL: https://docs.llmgateway.io/migrations/openrouter
LLM Gateway works just like OpenRouter—same API format, same model names—but with built-in analytics and the option to self-host. Migration takes two lines of code. ## Quick Migration [#quick-migration] Change your base URL and API key: ```diff - const baseURL = "https://openrouter.ai/api/v1"; - const apiKey = process.env.OPENROUTER_API_KEY; + const baseURL = "https://api.llmgateway.io/v1"; + const apiKey = process.env.LLM_GATEWAY_API_KEY; ```
## Migration Steps [#migration-steps] ### Get Your LLM Gateway API Key [#get-your-llm-gateway-api-key] Sign up at [llmgateway.io/signup](https://llmgateway.io/signup) and create an API key from your dashboard. ### Update Environment Variables [#update-environment-variables] ```bash # Remove OpenRouter credentials # OPENROUTER_API_KEY=sk-or-... # Add LLM Gateway credentials LLM_GATEWAY_API_KEY=llmgtwy_your_key_here ``` ### Update Your Code [#update-your-code] #### Using fetch/axios [#using-fetchaxios] ```typescript // Before (OpenRouter) const response = await fetch("https://openrouter.ai/api/v1/chat/completions", { method: "POST", headers: { Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "openai/gpt-5.2", messages: [{ role: "user", content: "Hello!" }], }), }); // After (LLM Gateway) const response = await fetch("https://api.llmgateway.io/v1/chat/completions", { method: "POST", headers: { Authorization: `Bearer ${process.env.LLM_GATEWAY_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "gpt-5.2", messages: [{ role: "user", content: "Hello!" }], }), }); ``` #### Using OpenAI SDK [#using-openai-sdk] ```typescript import OpenAI from "openai"; // Before (OpenRouter) const client = new OpenAI({ baseURL: "https://openrouter.ai/api/v1", apiKey: process.env.OPENROUTER_API_KEY, }); // After (LLM Gateway) const client = new OpenAI({ baseURL: "https://api.llmgateway.io/v1", apiKey: process.env.LLM_GATEWAY_API_KEY, }); // Usage remains the same const completion = await client.chat.completions.create({ model: "anthropic/claude-3-5-sonnet-20241022", messages: [{ role: "user", content: "Hello!" }], }); ``` #### Using Vercel AI SDK [#using-vercel-ai-sdk] Both OpenRouter and LLM Gateway have native AI SDK providers, making migration straightforward: ```typescript import { generateText } from "ai"; // Before (OpenRouter AI SDK Provider) import { createOpenRouter } from "@openrouter/ai-sdk-provider"; const openrouter = createOpenRouter({ apiKey: process.env.OPENROUTER_API_KEY, }); const { text } = await generateText({ model: openrouter("gpt-5.2"), prompt: "Hello!", }); // After (LLM Gateway AI SDK Provider) import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; const llmgateway = createLLMGateway({ apiKey: process.env.LLMGATEWAY_API_KEY, }); const { text } = await generateText({ model: llmgateway("gpt-5.2"), prompt: "Hello!", }); ```
## Model Name Mapping [#model-name-mapping] Most model names are compatible, but here are some common mappings: | OpenRouter Model | LLM Gateway Model | | -------------------------------- | ----------------------------------------------------------------- | | openai/gpt-5.2 | gpt-5.2 or openai/gpt-5.2 | | gemini/gemini-3-flash-preview | gemini-3-flash-preview or google-ai-studio/gemini-3-flash-preview | | bedrock/claude-opus-4-5-20251101 | claude-opus-4-5-20251101 or aws-bedrock/claude-opus-4-5-20251101 | Check the [models page](https://llmgateway.io/models) for the full list of available models.
## Streaming Support [#streaming-support] LLM Gateway supports streaming responses identically to OpenRouter: ```typescript const stream = await client.chat.completions.create({ model: "anthropic/claude-3-5-sonnet-20241022", messages: [{ role: "user", content: "Write a story" }], stream: true, }); for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content || ""); } ```
## Full Comparison [#full-comparison] Want to see a detailed breakdown of all features? Check out our [LLM Gateway vs OpenRouter comparison page](https://llmgateway.io/compare/open-router).
## Need Help? [#need-help] * Browse available models at [llmgateway.io/models](https://llmgateway.io/models) * Read the [API documentation](https://docs.llmgateway.io) * Contact support at [contact@llmgateway.io](mailto:contact@llmgateway.io)
# Migrate from Vercel AI Gateway URL: https://docs.llmgateway.io/migrations/vercel-ai-gateway
## Quick Migration [#quick-migration] Swap your provider imports—your AI SDK code stays the same: ```diff - import { openai } from "@ai-sdk/openai"; - import { anthropic } from "@ai-sdk/anthropic"; + import { generateText } from "ai"; + import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; + const llmgateway = createLLMGateway({ + apiKey: process.env.LLM_GATEWAY_API_KEY + }); const { text } = await generateText({ - model: openai("gpt-5.2"), + model: llmgateway("gpt-5.2"), prompt: "Hello!" }); ``` The key difference: one provider, one API key, all models—with caching and analytics built in.
## Migration Steps [#migration-steps] ### Get Your LLM Gateway API Key [#get-your-llm-gateway-api-key] Sign up at [llmgateway.io/signup](https://llmgateway.io/signup) and create an API key from your dashboard. ### Install the LLM Gateway AI SDK Provider [#install-the-llm-gateway-ai-sdk-provider] Install the native LLM Gateway provider for the Vercel AI SDK: ```bash pnpm add @llmgateway/ai-sdk-provider ``` This package provides full compatibility with the Vercel AI SDK and supports all LLM Gateway features. ### Update Your Code [#update-your-code] #### Basic Text Generation [#basic-text-generation] ```typescript // Before (Vercel AI Gateway with native providers) import { openai } from "@ai-sdk/openai"; import { anthropic } from "@ai-sdk/anthropic"; import { generateText } from "ai"; const { text: openaiText } = await generateText({ model: openai("gpt-4o"), prompt: "Hello!", }); const { text: claudeText } = await generateText({ model: anthropic("claude-3-5-sonnet-20241022"), prompt: "Hello!", }); // After (LLM Gateway - single provider for all models) import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; import { generateText } from "ai"; const llmgateway = createLLMGateway({ apiKey: process.env.LLM_GATEWAY_API_KEY, }); const { text: openaiText } = await generateText({ model: llmgateway("openai/gpt-4o"), prompt: "Hello!", }); const { text: claudeText } = await generateText({ model: llmgateway("anthropic/claude-3-5-sonnet-20241022"), prompt: "Hello!", }); ``` #### Streaming Responses [#streaming-responses] ```typescript import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; import { streamText } from "ai"; const llmgateway = createLLMGateway({ apiKey: process.env.LLM_GATEWAY_API_KEY, }); const { textStream } = await streamText({ model: llmgateway("anthropic/claude-3-5-sonnet-20241022"), prompt: "Write a poem about coding", }); for await (const text of textStream) { process.stdout.write(text); } ``` #### Using in Next.js API Routes [#using-in-nextjs-api-routes] ```typescript // app/api/chat/route.ts import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; import { streamText } from "ai"; const llmgateway = createLLMGateway({ apiKey: process.env.LLM_GATEWAY_API_KEY, }); export async function POST(req: Request) { const { messages } = await req.json(); const result = await streamText({ model: llmgateway("openai/gpt-4o"), messages, }); return result.toDataStreamResponse(); } ``` #### Alternative: Using OpenAI SDK Adapter [#alternative-using-openai-sdk-adapter] If you prefer not to install a new package, you can use `@ai-sdk/openai` with a custom base URL: ```typescript import { createOpenAI } from "@ai-sdk/openai"; import { generateText } from "ai"; const llmgateway = createOpenAI({ baseURL: "https://api.llmgateway.io/v1", apiKey: process.env.LLM_GATEWAY_API_KEY, }); const { text } = await generateText({ model: llmgateway("openai/gpt-4o"), prompt: "Hello!", }); ``` ### Update Environment Variables [#update-environment-variables] ```bash # Remove individual provider keys (optional - can keep as backup) # OPENAI_API_KEY=sk-... # ANTHROPIC_API_KEY=sk-ant-... # Add LLM Gateway key export LLM_GATEWAY_API_KEY=llmgtwy_your_key_here ```
## Model Name Format [#model-name-format] LLM Gateway supports two model ID formats: **Root Model IDs** (without provider prefix) - Uses smart routing to automatically select the best provider based on uptime, throughput, price, and latency: ``` gpt-4o claude-3-5-sonnet-20241022 gemini-1.5-pro ``` **Provider-Prefixed Model IDs** - Routes to a specific provider with automatic failover if uptime drops below 90%: ``` openai/gpt-4o anthropic/claude-3-5-sonnet-20241022 google-ai-studio/gemini-1.5-pro ``` For more details on routing behavior, see the [routing documentation](/features/routing). ### Model Mapping Examples [#model-mapping-examples] | Vercel AI SDK | LLM Gateway | | ----------------------------------------- | -------------------------------------------------------------------------------------------------- | | `openai("gpt-4o")` | `llmgateway("gpt-4o")` or `llmgateway("openai/gpt-4o")` | | `anthropic("claude-3-5-sonnet-20241022")` | `llmgateway("claude-3-5-sonnet-20241022")` or `llmgateway("anthropic/claude-3-5-sonnet-20241022")` | | `google("gemini-1.5-pro")` | `llmgateway("gemini-1.5-pro")` or `llmgateway("google-ai-studio/gemini-1.5-pro")` | Check the [models page](https://llmgateway.io/models) for the full list of available models.
## Tool Calling [#tool-calling] LLM Gateway supports tool calling through the AI SDK: ```typescript import { createLLMGateway } from "@llmgateway/ai-sdk-provider"; import { generateText, tool } from "ai"; import { z } from "zod"; const llmgateway = createLLMGateway({ apiKey: process.env.LLM_GATEWAY_API_KEY, }); const { text, toolResults } = await generateText({ model: llmgateway("openai/gpt-4o"), tools: { weather: tool({ description: "Get the weather for a location", parameters: z.object({ location: z.string(), }), execute: async ({ location }) => { return { temperature: 72, condition: "sunny" }; }, }), }, prompt: "What's the weather in San Francisco?", }); ```
## Self-Hosting LLM Gateway [#self-hosting-llm-gateway] If you prefer self-hosting, LLM Gateway is available under AGPLv3: ```bash git clone https://github.com/llmgateway/llmgateway cd llmgateway pnpm install pnpm setup pnpm dev ``` This gives you the same managed experience with full control over your infrastructure.
## Need Help? [#need-help] * Browse available models at [llmgateway.io/models](https://llmgateway.io/models) * Read the [API documentation](https://docs.llmgateway.io) * Contact support at [contact@llmgateway.io](mailto:contact@llmgateway.io)
# Error Handling URL: https://docs.llmgateway.io/resources/error-handling # Error Handling [#error-handling] On the OpenAI-compatible endpoints, LLMGateway returns errors in the same format as the OpenAI API, so existing OpenAI SDKs and tooling can parse gateway errors without changes. This applies to errors forwarded from upstream providers as well as errors raised by the gateway itself (authentication failures, usage limits, validation problems, timeouts, and so on). The Anthropic-compatible Messages endpoint (`/v1/messages`) instead returns Anthropic-native errors — see [Anthropic Endpoint](#anthropic-endpoint) below. ## Error Format [#error-format] Errors on the OpenAI-compatible endpoints (`/v1/chat/completions`, `/v1/embeddings`, `/v1/images`, `/v1/models`, `/v1/moderations`, `/v1/responses`, `/v1/videos`) use the standard OpenAI error envelope: ```json { "error": { "message": "Unauthorized: LLMGateway API key reached its usage limit.", "type": "invalid_request_error", "param": null, "code": "invalid_api_key" } } ``` | Field | Description | | --------------- | ----------------------------------------------------------------------------------- | | `error.message` | Human-readable description of what went wrong. | | `error.type` | High-level error category (see the table below). | | `error.param` | The request parameter that caused the error, or `null` when not parameter-specific. | | `error.code` | A more specific machine-readable code, or `null` when no specific code applies. | The HTTP status code on the response always matches the error and is the authoritative signal — read it from the response status line rather than the body. ## Status Codes [#status-codes] The gateway maps HTTP status codes to OpenAI error types and codes as follows: | Status | `type` | `code` | | ------ | ----------------------- | ------------------------ | | 400 | `invalid_request_error` | *(varies / `null`)* | | 401 | `invalid_request_error` | `invalid_api_key` | | 402 | `invalid_request_error` | `billing_error` | | 403 | `invalid_request_error` | `permission_denied` | | 404 | `invalid_request_error` | `not_found` | | 408 | `timeout_error` | `timeout` | | 413 | `invalid_request_error` | `request_too_large` | | 415 | `invalid_request_error` | `unsupported_media_type` | | 429 | `rate_limit_error` | `rate_limit_exceeded` | | 499 | `invalid_request_error` | `request_cancelled` | | 504 | `timeout_error` | `timeout` | | 5xx | `api_error` | *(`null`)* | Validation errors raised before a request reaches a provider often include a more specific `code` and a `param` pointing at the offending field — for example `invalid_json`, `model_not_found`, or `unsupported_parameter_combination`. ## Streaming Errors [#streaming-errors] For streaming requests (`"stream": true`), an error that occurs **after** the stream has started is delivered as an SSE `error` event whose payload uses the same `{ "error": { ... } }` envelope. Errors that occur **before** streaming begins (such as authentication failures) are returned as a normal JSON error response with the appropriate status code. ## Anthropic Endpoint [#anthropic-endpoint] The Anthropic-compatible Messages endpoint (`/v1/messages`) returns errors in Anthropic's native format instead, so the Anthropic SDK can parse them: ```json { "type": "error", "error": { "type": "authentication_error", "message": "Unauthorized: invalid API key." } } ``` ## Related [#related] * [Rate Limits](/resources/rate-limits) — details on `429` responses and rate limit headers. # Rate Limits URL: https://docs.llmgateway.io/resources/rate-limits # Rate Limits [#rate-limits] LLMGateway implements rate limits to ensure fair usage and optimal performance for all users. The rate limits differ based on your account status and the type of models you're using. ## Free Models [#free-models] Free models (models with zero input and output pricing) have rate limits that depend on your account's credit status: ### Base Rate Limits [#base-rate-limits] For organizations with **zero credits**: * **5 requests per 10 minutes** * Applies to all free model requests * Resets every 10 minutes ### Elevated Rate Limits [#elevated-rate-limits] For organizations that have **purchased at least some credits**: * **20 requests per minute** * Applies to all free model requests * Resets every minute When using free models with elevated limits, your credits will **not** be deducted. The elevated rate limits are simply a benefit for users who have added credits to their account. ## Paid Models [#paid-models] **Paid AI models are not currently rate limited.** You can make as many requests as needed to paid models, subject only to your account's credit balance and any provider-specific limits. ## Rate Limit Headers [#rate-limit-headers] All API responses include rate limit information in the headers: ```http X-RateLimit-Limit: 20 X-RateLimit-Remaining: 19 X-RateLimit-Reset: 1640995200 ``` * `X-RateLimit-Limit`: Maximum number of requests allowed in the current window * `X-RateLimit-Remaining`: Number of requests remaining in the current window * `X-RateLimit-Reset`: Unix timestamp when the rate limit window resets ## Rate Limit Exceeded [#rate-limit-exceeded] When you exceed your rate limit, you'll receive a `429 Too Many Requests` response: ```json { "error": { "message": "Rate limit exceeded. Try again later.", "type": "rate_limit_error", "code": "rate_limit_exceeded" } } ``` This uses the standard OpenAI-compatible error envelope — see [Error Handling](/resources/error-handling) for the full format and status-code reference. ## Best Practices [#best-practices] ### Upgrading Your Limits [#upgrading-your-limits] To unlock elevated rate limits for free models: 1. Add credits to your account through the dashboard 2. Your rate limits will automatically increase to 20 requests per minute 3. Free model usage will still not deduct from your credits ### Handling Rate Limits [#handling-rate-limits] * Implement exponential backoff when you receive 429 responses * Monitor the `X-RateLimit-Remaining` header to avoid hitting limits * Consider using paid models for high-volume applications ### Cost Optimization [#cost-optimization] * Use free models for development and testing * Switch to paid models for production workloads requiring higher throughput * Monitor your usage patterns through the dashboard Adding even a small amount of credits to your account (e.g., $10) will immediately upgrade your free model rate limits from 5 requests per 10 minutes to 20 requests per minute. # Gateway Caching URL: https://docs.llmgateway.io/features/caching/gateway-caching # Gateway Caching [#gateway-caching] Gateway caching serves a previously-seen, byte-identical request entirely from LLM Gateway without forwarding it to the upstream provider. Repeated identical calls cost **$0** — there is no inference and no provider charge. It is most useful for API workloads with deterministic inputs (classification, batch jobs, FAQ lookups, retries) rather than free-form chat. If you want to reduce the cost of long, partially-shared prompts in chat apps or coding tools, you want [Provider Cache Control](/features/caching/provider-cache-control) instead. That discounts the cached portion of your prompt on every call — it does not require byte-identical requests. See the [Caching Overview](/features/caching) for a side-by-side comparison. ## How It Works [#how-it-works] When you make an API request: 1. LLM Gateway generates a cache key based on the request parameters 2. If a matching cached response exists, it's returned immediately 3. If no cache exists, the request is forwarded to the provider 4. The response is cached for future identical requests This means repeated identical requests are served instantly from cache without incurring additional provider costs. ## Cost Savings [#cost-savings] Caching can dramatically reduce costs for applications with repetitive requests: | Scenario | Without Caching | With Caching | Savings | | --------------------------- | --------------- | ------------ | ------- | | 1,000 identical requests | $10.00 | $0.01 | 99.9% | | 50% duplicate rate | $10.00 | $5.00 | 50% | | Retry after transient error | $0.02 | $0.01 | 50% | Cached responses are free from provider costs. You only pay for the initial request that populates the cache. ## Requirements [#requirements] Caching is **free** and **independent** of [Data Retention](/features/data-retention). Cached responses live in a short-lived cache (TTL-bound, typically seconds to minutes) and are not stored as long-term request data — you do not need to enable data retention to use caching. To use caching: 1. Enable **Caching** in your project settings under Preferences 2. Configure the cache duration (TTL) as needed 3. Make requests as normal—caching is automatic ## Cache Key Generation [#cache-key-generation] The cache key is generated from these request parameters: * Model identifier * Messages array (roles and content) * Temperature * Max tokens * Top P * Tools/functions * Tool choice * Response format * System prompt * Other model-specific parameters Requests with different parameter values, even slight variations, will not share cache entries. ## Cache Behavior [#cache-behavior] ### Cache Hits [#cache-hits] When a cache hit occurs: * Response is returned immediately (sub-millisecond latency) * No provider API call is made * No inference costs are incurred ### Cache Misses [#cache-misses] When a cache miss occurs: * Request is forwarded to the LLM provider * Response is stored in cache * Normal inference costs apply * Future identical requests will hit the cache ## Streaming and Caching [#streaming-and-caching] Caching works with both streaming and non-streaming requests: * **Non-streaming**: Full response is cached and returned * **Streaming**: The complete response is reconstructed from cache and streamed back ## Cache TTL (Time-to-Live) [#cache-ttl-time-to-live] Cache duration is configurable per project in your project settings. You can set the cache TTL from 10 seconds up to 1 year (31,536,000 seconds). The default cache duration is 60 seconds. Adjust this based on your use case—longer durations work well for static content, while shorter durations are better for frequently changing data. ## Identifying Cached Responses [#identifying-cached-responses] Cached responses show zero or minimal token usage since no inference occurred: ```json { "usage": { "prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0, "cost": 0, "cost_details": { "total_cost": 0, "input_cost": 0, "output_cost": 0 } } } ``` ## Use Cases [#use-cases] ### Development and Testing [#development-and-testing] During development, you often send the same prompts repeatedly: ```typescript // This prompt will only incur costs once const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Explain quantum computing" }], }); ``` ### Chatbots with Common Questions [#chatbots-with-common-questions] FAQ-style interactions often have repeated questions: ```typescript // Common questions are served from cache const faqs = [ "What are your business hours?", "How do I reset my password?", "What is your return policy?", ]; ``` ### Batch Processing [#batch-processing] Processing large datasets with potentially duplicate items: ```typescript // Duplicate items in batch are served from cache for (const item of items) { const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: `Classify: ${item}` }], }); } ``` ## Best Practices [#best-practices] ### Maximize Cache Hits [#maximize-cache-hits] * Use consistent prompt formatting * Normalize input data before sending * Use deterministic parameters (temperature: 0) * Avoid including timestamps or random values in prompts ### Appropriate Use Cases [#appropriate-use-cases] Caching is most effective for: * Static knowledge queries * Classification tasks * FAQ responses * Development/testing * Retry scenarios ### When to Avoid Caching [#when-to-avoid-caching] Caching may not be suitable for: * Real-time data requirements * Highly personalized responses * Time-sensitive information * Creative tasks requiring variety * Chat or coding tools where prompts overlap but are not byte-identical — use [Provider Cache Control](/features/caching/provider-cache-control) instead ## Pricing [#pricing] Caching is **completely free**. Cached responses are held in a short-lived in-memory cache (bounded by your configured TTL) and do not incur storage charges. Storage costs only apply if you separately enable [Data Retention](/features/data-retention) for full request/response payloads. Caching reduces both inference cost and latency at no additional charge. # Caching URL: https://docs.llmgateway.io/features/caching # Caching [#caching] LLM Gateway supports **two distinct kinds of caching**, and they solve different problems. Pick the one that matches your workload — they can also be used together. ## Provider / Model Caching [#provider--model-caching] The provider performs the caching. When your request reuses a long prefix from a previous call (a system prompt, conversation history, tool definitions, a long document), the model serves that prefix from its prompt cache and bills it at a reduced rate. New input tokens and **all output tokens are still billed at the normal rate** — only the cached portion is discounted. This is the type of caching that powers efficient chat-based and assistant-based interactions, including chat apps and coding tools (Cursor, Cline, Claude Code, etc.) where the same context is reused turn after turn. You see it in your usage as `prompt_tokens_details.cached_tokens`. For most providers it works automatically; some (notably Anthropic) also let you mark blocks explicitly with `cache_control` and choose a longer TTL. → **[Read the Provider Cache Control docs](/features/caching/provider-cache-control)** ## Gateway Caching [#gateway-caching] LLM Gateway performs the caching. When a request is **byte-identical** to a previous one (same model, same messages, same parameters), the response is served from the gateway's cache without any provider call. Repeated identical calls cost **$0**. This is most useful for deterministic API workloads — classification, batch jobs, FAQ lookups, retries — rather than free-form chat, because chat prompts almost always differ on the latest turn. → **[Read the Gateway Caching docs](/features/caching/gateway-caching)** ## Which one do I want? [#which-one-do-i-want] | If you… | Use | | --------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | | Build a chat app, assistant, or coding tool | [Provider Cache Control](/features/caching/provider-cache-control) | | Send long system prompts or growing conversation history | [Provider Cache Control](/features/caching/provider-cache-control) | | Want longer cache lifetimes than the provider default | [Provider Cache Control](/features/caching/provider-cache-control) (explicit `cache_control`) | | Send the exact same request many times (batches, retries, FAQs) | [Gateway Caching](/features/caching/gateway-caching) | | Want $0 on repeated calls instead of a discount | [Gateway Caching](/features/caching/gateway-caching) | The two are not mutually exclusive. A coding tool can rely on provider caching for its long system prompt **and** enable gateway caching so that deterministic tool calls (e.g., file lookups) cost nothing on retry. # Provider Cache Control URL: https://docs.llmgateway.io/features/caching/provider-cache-control # Provider Cache Control [#provider-cache-control] Most modern LLM providers offer **prompt caching**: when a request reuses a long prefix from a previous request (for example, a multi-thousand-token system prompt or a growing conversation history), the provider stores that prefix and serves it back at a steep discount on subsequent calls. Only the cached portion is discounted — new input tokens and all output tokens are still billed at the normal rate. This is the behavior you see surfaced as `cached_tokens` in your usage payloads, and it is what makes chat apps, assistants, and coding tools (Cursor, Cline, Claude Code, etc.) economically viable on long contexts. Looking for $0 on repeated calls instead of a discount on the cached portion? That is [Gateway Caching](/features/caching/gateway-caching), which serves byte-identical requests entirely from LLM Gateway without hitting the provider. It is a better fit for deterministic API workloads than for chat. See the [Caching Overview](/features/caching) for a side-by-side comparison. ## Automatic caching [#automatic-caching] For most users, prompt caching just works — you do not need to change your request payloads. Providers including OpenAI, Anthropic (when prompts cross the provider's minimum size), Google, DeepSeek, xAI, and Alibaba inspect incoming requests for shared prefixes and cache them automatically. LLM Gateway forwards the provider's cache metadata back to you in the response, and bills the cached portion at the model's `cached_input` rate. For **Anthropic** and **AWS Bedrock Claude**, prompt caching is strictly opt-in via `cache_control` / `cachePoint` markers on the request body. To get automatic cache benefits without rewriting your requests, LLM Gateway injects those markers for you on long system and user messages by default. If you send long prompts sporadically — with gaps wider than the 5-minute TTL — you may want to disable this entirely, since you would otherwise pay the cache-write premium (1.25× input for 5m, 2× for 1h) without ever benefiting from a cache read. To disable, open **Project Settings → Caching → Provider Cache Writes** and turn off "Allow provider cache writes". When disabled, the gateway strips **all** `cache_control` markers from outgoing requests for the project — both the ones it adds automatically and any markers your client sends. This covers callers that always emit markers regardless of the user's request cadence (e.g. Claude Code, Cursor, Cline). The change takes up to 5 minutes to take effect due to the project-settings cache. To take advantage of automatic caching: * Put stable content (system prompt, instructions, tool definitions, long documents) at the **start** of your messages * Keep the variable portion (the latest user turn) at the **end** * Reuse the same prefix across requests — even minor changes invalidate the cache You can confirm the cache is working by inspecting `usage.prompt_tokens_details.cached_tokens` on the response. See [Cost Breakdown](/features/cost-breakdown) for the full list of usage fields. ```json { "usage": { "prompt_tokens": 8200, "completion_tokens": 150, "prompt_tokens_details": { "cached_tokens": 8000 }, "cost_details": { "input_cost": 0.0006, "cached_input_cost": 0.0008 } } } ``` In this example, 8,000 of the 8,200 prompt tokens were served from the provider's cache and billed at the cached rate. ### Pricing and routing [#pricing-and-routing] Cached input tokens are billed at the model's published `cached_input` price (typically 10–25% of the regular input price, depending on the provider and model). Output tokens and any non-cached input tokens are billed at the normal rate. When the [Smart Routing](/features/routing) algorithm selects a provider for a large prompt (≥ 5,000 estimated tokens), it gives extra weight to providers that advertise cache support, since caching can substantially reduce the cost of repeated large prompts. ## Explicit caching with `cache_control` [#explicit-caching-with-cache_control] Some providers — most notably **Anthropic** — also support *explicit* cache control, where you mark specific content blocks as cacheable using a `cache_control` field. This gives you precise control over what gets cached and lets you opt into longer cache lifetimes than the default. Explicit caching is provider-specific. Supported providers and TTLs at the time of writing: | Provider | Models | Supported TTLs | | -------------------- | ------------------------------ | -------------------- | | Anthropic (Claude) | All Claude models | `5m` (default), `1h` | | AWS Bedrock (Claude) | All Claude models | `5m` (default), `1h` | | Alibaba (Qwen) | Qwen models with cache support | Provider-defined | To mark content as cacheable, send the message content as an array of blocks and add a `cache_control` field to the block you want to cache: ```json { "model": "claude-haiku-4-5", "messages": [ { "role": "system", "content": [ { "type": "text", "text": "You are a helpful assistant. ", "cache_control": { "type": "ephemeral", "ttl": "1h" } } ] }, { "role": "user", "content": "What is the capital of France?" } ] } ``` Use `ttl: "5m"` (the default if omitted) for short-lived caches that match a single user's session, and `ttl: "1h"` when the same prefix will be reused over a longer window (for example, a coding agent that keeps the same project context warm across many requests). ### Mixing explicit markers with automatic injection [#mixing-explicit-markers-with-automatic-injection] Anthropic requires cache breakpoints with longer TTLs to appear before shorter ones (blocks are processed in the order `tools`, `system`, `messages`). The markers LLM Gateway injects automatically use the default 5-minute TTL, so they could never legally precede an explicit `ttl: "1h"` marker in your messages. To keep both features compatible: * When your request contains an explicit `ttl: "1h"` marker in the **messages**, LLM Gateway skips its automatic marker injection for that request entirely and forwards only your markers — the same behavior you would get calling the provider directly. * A `ttl: "1h"` marker only on the **system** prompt does not disable automatic injection, since 5-minute breakpoints after it still satisfy the ordering rule. * Explicit markers that use the default 5-minute TTL coexist with automatic injection (capped at 4 breakpoints total per Anthropic's limit). Cache writes are billed at a premium (typically 1.25x for 5m and 2x for 1h on Anthropic) the first time a cached block is created. After that, cache reads cost roughly 10% of the regular input price. The break-even point is usually one or two reuses — explicit caching is worth it whenever a marked block will be sent more than once within its TTL. Anthropic returns a per-TTL breakdown of cache writes when you mix `5m` and `1h` blocks: ```json { "usage": { "cache_creation": { "ephemeral_5m_input_tokens": 0, "ephemeral_1h_input_tokens": 8000 }, "cache_read_input_tokens": 0 } } ``` For providers that publish a separate explicit-cache read rate (for example, Alibaba Qwen charges 10% for explicit cache reads vs. 20% for automatic cache reads), LLM Gateway detects the `cache_control` markers on your request and applies the explicit rate automatically. ## Related [#related] * [Gateway Caching](/features/caching/gateway-caching) — serve byte-identical requests entirely from LLM Gateway at $0 cost * [Caching Overview](/features/caching) — side-by-side comparison of provider caching vs. gateway caching * [Cost Breakdown](/features/cost-breakdown) — full reference for the usage and cost fields on every response * [Smart Routing](/features/routing) — how cache support influences provider selection for large prompts