Group related requests into a session to keep provider routing sticky and to observe a conversation end-to-end in the activity log.

Sessions

A session ties together the requests that belong to the same conversation or workflow. By attaching a stable session identifier to your requests, LLMGateway can treat them as a unit — keeping provider routing consistent across turns and letting you trace and filter the whole conversation in the dashboard.

Sessions are the foundation for several features. Today they power sticky provider routing and session-level observability; more session-scoped capabilities will build on the same identifier over time.

Setting the session id

For chat completions, the session key is resolved in priority order — the first present value wins:

The x-session-id header
The x-session-affinity header (sent automatically by coding agents such as opencode)
The prompt_cache_key body field (OpenAI-compatible)
The user body field (OpenAI-compatible)

curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-session-id: conversation-9f8e7d6c" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Reuse the same session id for every request in a conversation. If you don't set any of the values above, the request simply has no session and behaves exactly as before.

Anthropic Messages endpoint

For the Anthropic Messages endpoint (/v1/messages), the session key is derived automatically from metadata.user_id. Coding agents such as Claude Code send a JSON object there (e.g. {"session_id":"<uuid>",…}); the gateway uses its session_id field. An explicit x-session-id header still takes precedence.

Sticky provider routing

When a model is served by multiple providers, requests are normally scored independently, so a multi-turn conversation can bounce between providers. That defeats provider-side prompt caching, which only pays off when consecutive requests with a shared prefix reach the same provider.

With a session id set, LLMGateway scores the session's first request with the normal weighted smart-routing algorithm (price, priority, uptime, throughput) and then pins that provider for the session, reusing it on every subsequent request to keep the prompt cache warm. The session stays on that provider — skipping the epsilon-greedy exploration — and only moves when its provider drops below the session uptime threshold or leaves the available pool (health filtering or a failed request dropped by retry/fallback), at which point the session is re-scored and re-pinned to the current best provider.

See Routing → Sticky Session Routing for the full algorithm, fallback behavior, and the session-sticky routing-metadata reason.

Session stickiness is on by default. Enterprise projects can turn it off per project under Settings → Routing → Session Stickiness; when disabled, every request is scored independently regardless of session id (the id is still recorded for observability).

Sticky routing optimizes for cache locality over per-request price. A session stays on its provider even if a cheaper or faster alternative is momentarily available, since the prompt-cache savings typically outweigh the difference.

Upstream prompt-cache routing

Some providers use an OpenAI-style prompt_cache_key to route requests to the cache shard that already holds your prompt prefix — without it, repeat requests can land on different backends and miss the cache entirely (Meta requires it for cache hits in practice; OpenAI and Azure use it to improve hit rates under load).

When a request has a session id and you didn't send a prompt_cache_key yourself, LLMGateway forwards a keyed hash (HMAC-SHA256 with a gateway-side secret) of the session id as the prompt_cache_key to providers that support it (currently OpenAI, Azure, and Meta). Hashing means your raw session ids are never exposed to providers; the hash is stable per session, which is all cache routing needs. A prompt_cache_key you set explicitly takes precedence and is forwarded as-is on those same surfaces.

On provider surfaces that don't support the field, no key is sent at all — whether derived or explicit. This currently applies to Sakana (the field is not part of its API) and to Azure chat-completions requests, which can be served by legacy deployment-based API versions that reject unknown body fields; Azure requests on the Responses API always carry the key. Providers not listed above use different caching mechanisms (for example Anthropic cache_control breakpoints or Google implicit caching), so the prompt_cache_key doesn't apply to them either.

For Meta, requests without any session id still get a cache key derived from the conversation's first messages, so multi-turn conversations hit Meta's prompt cache even when no session signal is present.

Observing sessions in the activity log

Every request is logged with its resolved session id. In the dashboard Activity view you can:

See the Session ID on each request's metadata, alongside the request and trace IDs.
Filter by session id using the search field next to the custom-metadata search, to pull up every request that belongs to a conversation in one place.

This makes it easy to follow a full conversation end-to-end — inspecting how each turn was routed, what it cost, and which provider served it.

The session id is distinct from freeform metadata. Use metadata custom headers for arbitrary tags (user, tenant, app version); use the session id for the one value that should keep a conversation pinned and traceable.

Sessions

Sessions

Setting the session id

Anthropic Messages endpoint

Sticky provider routing

Upstream prompt-cache routing

Observing sessions in the activity log

On this page