LLM Gateway
Features

Cost Breakdown

Get real-time cost information for each API request directly in the response.

Cost Breakdown

LLM Gateway provides real-time cost information for each API request directly in the response's usage object. This allows you to track costs programmatically without needing to query the dashboard.

Cost breakdown is available for Pro plan users on hosted LLM Gateway. Self-hosted deployments always include cost information.

Response Format

When cost breakdown is enabled, your API responses will include additional cost fields in the usage object:

{
	"id": "chatcmpl-123",
	"object": "chat.completion",
	"created": 1234567890,
	"model": "openai/gpt-4o",
	"choices": [
		{
			"index": 0,
			"message": {
				"role": "assistant",
				"content": "Hello! How can I help you today?"
			},
			"finish_reason": "stop"
		}
	],
	"usage": {
		"prompt_tokens": 10,
		"completion_tokens": 15,
		"total_tokens": 25,
		"cost_usd_total": 0.000125,
		"cost_usd_input": 0.000025,
		"cost_usd_output": 0.0001,
		"cost_usd_cached_input": 0,
		"cost_usd_request": 0,
		"cost_usd_data_storage": 0.00000025
	}
}

Cost Fields

FieldDescription
cost_usd_totalTotal inference cost for the request in USD (excludes storage)
cost_usd_inputCost for input/prompt tokens in USD
cost_usd_outputCost for output/completion tokens in USD
cost_usd_cached_inputCost for cached input tokens in USD (discounted rate)
cost_usd_requestPer-request cost in USD (for models with request-based pricing)
cost_usd_data_storageLLM Gateway storage cost in USD ($0.01 per 1M tokens, only when retention enabled)

Note: cost_usd_total includes only provider/inference costs. Data storage costs (cost_usd_data_storage) are billed separately by LLM Gateway when data retention is enabled in organization policies.

Streaming Responses

Cost information is also available in streaming responses. The cost fields are included in the final usage chunk sent before the [DONE] message:

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[...],"usage":{"prompt_tokens":10,"completion_tokens":15,"total_tokens":25,"cost_usd_total":0.000125,"cost_usd_input":0.000025,"cost_usd_output":0.0001}}

data: [DONE]

Free Plan Users

If you're on the free plan (hosted LLM Gateway), the usage object will include an informational message instead of cost fields:

{
	"usage": {
		"prompt_tokens": 10,
		"completion_tokens": 15,
		"total_tokens": 25,
		"info": "upgrade to pro to include usd cost breakdown"
	}
}

Upgrade to Pro to unlock real-time cost breakdown in your API responses.

Example: Tracking Costs in Code

Here's an example of how to track costs programmatically using the cost breakdown feature:

import OpenAI from "openai";

const client = new OpenAI({
	apiKey: process.env.LLM_GATEWAY_API_KEY,
	baseURL: "https://api.llmgateway.io/v1",
});

async function trackCosts() {
	const response = await client.chat.completions.create({
		model: "gpt-4o",
		messages: [{ role: "user", content: "Hello!" }],
	});

	const usage = response.usage as any;

	if (usage.cost_usd_total !== undefined) {
		console.log(`Request cost: $${usage.cost_usd_total.toFixed(6)}`);
		console.log(`  Input: $${usage.cost_usd_input.toFixed(6)}`);
		console.log(`  Output: $${usage.cost_usd_output.toFixed(6)}`);

		if (usage.cost_usd_cached_input > 0) {
			console.log(`  Cached: $${usage.cost_usd_cached_input.toFixed(6)}`);
		}
	}

	return response;
}

Use Cases

Budget Monitoring

Track costs in real-time and implement budget limits in your application:

let totalSpent = 0;
const BUDGET_LIMIT = 10.0; // $10 budget

async function makeRequest(messages: Message[]) {
	const response = await client.chat.completions.create({
		model: "gpt-4o",
		messages,
	});

	const cost = (response.usage as any).cost_usd_total || 0;
	totalSpent += cost;

	if (totalSpent > BUDGET_LIMIT) {
		throw new Error(`Budget exceeded: $${totalSpent.toFixed(2)}`);
	}

	return response;
}

Per-User Cost Allocation

Track costs per user for billing or analytics:

const userCosts: Map<string, number> = new Map();

async function makeRequestForUser(userId: string, messages: Message[]) {
	const response = await client.chat.completions.create({
		model: "gpt-4o",
		messages,
	});

	const cost = (response.usage as any).cost_usd_total || 0;
	const currentCost = userCosts.get(userId) || 0;
	userCosts.set(userId, currentCost + cost);

	return response;
}

Cost Analytics

Aggregate costs by model, time period, or any other dimension:

interface CostEntry {
	timestamp: Date;
	model: string;
	inputCost: number;
	outputCost: number;
	totalCost: number;
}

const costLog: CostEntry[] = [];

async function loggedRequest(model: string, messages: Message[]) {
	const response = await client.chat.completions.create({
		model,
		messages,
	});

	const usage = response.usage as any;

	costLog.push({
		timestamp: new Date(),
		model: response.model,
		inputCost: usage.cost_usd_input || 0,
		outputCost: usage.cost_usd_output || 0,
		totalCost: usage.cost_usd_total || 0,
	});

	return response;
}

Data Storage Costs

When data retention is enabled in organization policies, LLM Gateway stores full request and response payloads for the configured retention period. This storage incurs a small additional cost:

  • Rate: $0.01 per 1 million tokens
  • Applies to: Input, cached, output, and reasoning tokens
  • When charged: Only when retention level is set to "Retain All Data"
  • Billing mode: In API keys mode, only storage costs are deducted from credits (inference costs are billed to your provider keys)

Storage costs are displayed separately from inference costs in the dashboard and usage breakdown to maintain transparency between provider costs and LLM Gateway platform costs.

Enable auto top-up in billing settings to prevent request failures when storage costs deplete your credits.

Self-Hosted Deployments

If you're running a self-hosted LLM Gateway deployment, cost breakdown is always included in API responses regardless of plan. This allows you to track internal costs and allocate them across teams or projects.

How is this guide?

Last updated on