Embed end-user payments and sessions into your own site with the Payments SDK. Your end-users get their own wallet, buy credits, and pay per request — billed through LLM Gateway, with your markup as margin.

Embeddable Payments (Payments SDK)

The Payments SDK lets you drop end-user payments + in-app credit purchases into your product the same way Stripe Elements lets you drop in payments. Your end-users get their own wallet, buy credits inside your app, and pay per request for any model the gateway supports. LLM Gateway is the merchant of record; you set a markup and keep the margin.

Preview — opt-in only. Embeddable Payments is in preview and enabled on an opt-in basis per project. The Settings → Payments SDK dashboard page shows a read-only preview until the feature is turned on for your project. Contact us to request access before integrating.

When to use this

Reach for the Payments SDK when you embed LLM Gateway's billing on your own site and want your end-users to pay for the AI they use, each with their own wallet and credit balance. It handles end-user payments and sessions: per-user wallets, in-app credit top-ups (Stripe), short-lived browser sessions scoped to a single wallet, and developer-margin payouts.

This is a payments product, not an AI client SDK. Do not confuse it with a normal AI SDK such as the OpenAI SDK or the Vercel AI SDK. If you just want to call models from your own backend with a single API key, use the OpenAI-compatible API instead — you do not need the Payments SDK. Use the Payments SDK only when you need to charge your own end-users and give each of them a wallet and checkout inside your app.

It ships as three packages:

Package	Runs in	Use it for
`@llmgateway/server`	Your backend (secret key)	Mint end-user sessions, manage wallets/customers, verify webhooks, trigger payouts
`@llmgateway/client`	Browser (headless)	Framework-agnostic chat/image/embeddings + balance/top-up, with auto session refresh
`@llmgateway/elements`	React	Drop-in `<Chat/>`, `<BuyCredits/>`, `<CreditBalance/>` + hooks

A complete, runnable Next.js example lives in the templates repo: Embeddable Payments template.

How it works

Your backend ──(secret key sk_)──▶ POST /v1/sessions ──▶ ephemeral session token (es_, ~15 min)
      │                                                          │
      └────────── returns es_ to your frontend ◀────────────────┘
                                  │
        Browser (es_ + pk_) ──▶ chat / images / embeddings  ──▶ debits the end-user wallet
                            └──▶ buy credits (Stripe Elements) ─▶ credits land in the wallet

Your secret key (sk_…) never leaves your backend. It mints short-lived ephemeral session tokens (es_…) scoped to one end-user wallet.
The browser only ever holds the es_… token (and a publishable Stripe key). It calls the gateway directly; usage is billed to that user's wallet.
Markup is applied at top-up time: if you set a 20% markup and a user buys $10, their wallet is credited the net spend power and your margin accrues to your organization for later payout.
Top-up bonus (optional): set a bonus percent to credit end-users more than they pay — e.g. a 50% bonus turns a $10 top-up into $15 of spend power. The extra credits are funded from your organization's credit balance at top-up time (capped at your available credits), so it's a promotional lever you can switch on or off anytime. Markup and bonus are independent: the bonus is applied on top of the net credited amount.

Set up in the dashboard

Before you write any code, configure the project you want to embed:

Open the LLM Gateway dashboard and select your project.
Go to Settings → Payments SDK and turn on End-user sessions (this requires the preview to be enabled for your project).
(Optional) Set a markup percent — the margin you earn on every top-up — and/or a top-up bonus percent to gift end-users extra credit (funded from your organization's credit balance).
Add the browser origins allowed to call the gateway, one per line (e.g. https://app.example.com), then click Save Settings.
Under Platform Secret Keys, click Create Live Key (or Create Test Key) and copy the sk_… value immediately.
Store it as a server-side environment variable, for example LLMGATEWAY_SECRET_KEY.

The platform secret key (sk_…) is different from a regular gateway API key (llmgtwy_…): it mints end-user sessions and must only ever be used from your backend.

Test mode. A sk_test_… key is a sandbox key: end-user wallet top-ups go through Stripe's sandbox (use Stripe test cards, no real charges), and its wallets are fully segregated from live ones — the same end-user gets independent test and live wallets. To keep sandbox money from buying real inference, test-mode wallets can only call free models: use the auto route (it picks a free model automatically) or a free model id; paid models return a 403. Pair a test secret key on your backend with mode="test" on <LLMGatewayProvider> (see below) — the two must match.

The platform secret key is shown only once. Do not put it in frontend code, browser bundles, mobile apps, or public repos.

1. Install

# backend
npm install @llmgateway/server
# frontend (pick one)
npm install @llmgateway/elements   # React drop-in components
npm install @llmgateway/client     # headless / non-React

2. Mint a session on your backend

Identify your signed-in user and mint a session bound to their wallet. Scope which models they may call.

// app/api/llmgateway/session/route.ts  (Next.js Route Handler)
import { LLMGateway } from "@llmgateway/server";

const lg = new LLMGateway({ secretKey: process.env.LLMGATEWAY_SECRET_KEY! });

export async function POST() {
	const session = await lg.sessions.create({
		customer: { externalId: "user_123" }, // your stable user id
		scope: { models: ["openai/gpt-4o-mini"] }, // lock down what they can call
		ttlSeconds: 900, // optional, default 15 min
	});
	return Response.json(session); // { sessionToken, walletId, endCustomerId, expiresAt, publishableKey }
}

Always mint sessions server-side. Never ship your sk_… secret key to the browser.

3a. Drop in the React components

Wrap your UI in <LLMGatewayProvider> and use the components. fetchSession is how the client refreshes the short-lived token before it expires.

"use client";
import {
	LLMGatewayProvider,
	Chat,
	CreditBalance,
	BuyCredits,
} from "@llmgateway/elements";

const fetchSession = () =>
	fetch("/api/llmgateway/session", { method: "POST" }).then((r) => r.json());

export default function Assistant({ session }) {
	return (
		<LLMGatewayProvider
			session={session}
			fetchSession={fetchSession}
			mode={process.env.NODE_ENV === "production" ? "prod" : "test"}
			appearance={{ theme: "light" }}
		>
			<CreditBalance label="Your balance" />
			<BuyCredits amount={10} />
			<Chat model="openai/gpt-4o-mini" />
		</LLMGatewayProvider>
	);
}

Need full control over rendering? Use the hooks instead of the components:

useBalance() → { balance, currency, recentLedger, loading, error, refetch, refetchUntilChange }
useChat({ model }) → { turns, send, streaming, ... }

useBalance().refetchUntilChange() polls until the balance actually changes — use it after a purchase, since the wallet is credited asynchronously once the Stripe webhook lands.

3b. Or go headless (any framework)

import { LLMGatewayClient } from "@llmgateway/client";

const client = new LLMGatewayClient({
	session: { token: session.sessionToken, expiresAt: session.expiresAt },
	refresh: fetchSession, // auto-refreshes ~60s before expiry
});

// stream a completion (billed to the user's wallet)
for await (const delta of client.stream({
	model: "openai/gpt-4o-mini",
	messages: [{ role: "user", content: "Hello!" }],
})) {
	process.stdout.write(delta);
}

const { balance } = await client.getBalance();

The headless client also exposes chat(), image(), embeddings(), getBalance(), createTopUp(amount), and getConfig().

Spend limits

getBalance() (and useBalance()) also returns a limits object describing the spend limits enforced on the session, the amount consumed so far, and — when a windowed limit is configured — when it resets. This lets you show the user how much of their allowance is left before a request is rejected.

const { balance, limits } = await client.getBalance();

// limits: {
//   usageLimit,                // lifetime spend cap (null = uncapped)
//   usage,                     // consumed over the session's lifetime
//   periodUsageLimit,          // per-window cap (null = no windowed limit)
//   periodUsageDurationValue,  // e.g. 1
//   periodUsageDurationUnit,   // "hour" | "day" | "week" | "month"
//   currentPeriodUsage,        // consumed in the current window
//   currentPeriodStartedAt,    // ISO timestamp, or null
//   currentPeriodResetAt,      // ISO timestamp the window resets, or null
// }

null limit fields mean that cap is not configured. currentPeriodUsage is "0" and currentPeriodResetAt is null until the first spend in a fresh window.

Buying credits

<BuyCredits amount={10} /> creates a Stripe PaymentIntent scoped to the user's wallet, renders Stripe's PaymentElement, and confirms the payment. Once LLM Gateway's webhook processes it, the wallet is credited the net amount (after your markup) and your margin accrues to your organization. If you've configured a top-up bonus, the additional developer-funded credit is applied on top of the net amount at the same time (debited from your organization's credit balance). The top-up API response and the wallet.credited webhook include bonusCredited and totalCredited so you can show the boosted total.

@llmgateway/elements bundles LLM Gateway's browser-safe Stripe publishable keys. Pass mode="test" to <LLMGatewayProvider> while developing to use Stripe test mode; omit it or pass mode="prod" for live payments ("prod" is the default). You never need to provide LLM Gateway's Stripe publishable key yourself, and the end-user never sees your sk_… secret key.

The frontend mode prop and the backend secret key must match. A sk_test_… key creates the top-up PaymentIntent in the Stripe sandbox, which only the mode="test" publishable key can confirm — mixing a test key with mode="prod" (or vice versa) makes <BuyCredits> fail to confirm.

Managing wallets & customers (server-side)

// grant credits directly (e.g. free trial)
await lg.wallets.credit({ walletId, amount: 5, reason: "Signup bonus" });

const wallet = await lg.wallets.retrieve(walletId);

// analytics: customers with balances + lifetime spend
const { customers } = await lg.customers.list();
const detail = await lg.customers.retrieve(endCustomerId);

Webhooks

await lg.webhookEndpoints.create({
	url: "https://yourapp.com/webhooks/llmgateway",
	enabledEvents: ["wallet.credited", "wallet.low_balance"],
});

// in your handler
const event = lg.webhooks.constructEvent(
	rawBody,
	signatureHeader,
	endpointSecret,
);

Webhook URLs must be https and public — requests to private/internal addresses are rejected (SSRF protection), both at registration and at delivery time.

Margin payouts (Stripe Connect)

Your accrued markup is held as a margin balance. Onboard a connected account and pay it out:

const { url } = await lg.connect.createOnboardingLink({
	refreshUrl: "https://yourapp.com/settings/payouts",
	returnUrl: "https://yourapp.com/settings/payouts?done=1",
});
// redirect the developer to `url`, then later:
const status = await lg.connect.status(); // { onboarded, payoutsEnabled, marginBalance }
const payout = await lg.connect.payout(); // transfer the accrued margin out

Security model

Ephemeral tokens (es_…) are short-lived and revocable; mint them per-user from your backend.
Model scopes restrict each session to an allow-list of models.
Origin allowlist (configured on the project) blocks browser calls from unexpected origins.
Per-session spend caps (scope.maxSpend) bound how much a single session can spend.

Full example

The end-to-end Next.js app — backend session route, provider, chat, and buy-credits — is in the templates repo:

➡️ Embeddable Payments template

Embeddable Payments

On this page