LLM Gateway
Features

Speech Generation

Generate speech (text-to-speech) using ElevenLabs, Gemini, and OpenAI models through the OpenAI-compatible audio API

Speech Generation

LLMGateway supports text-to-speech (TTS) through the OpenAI-compatible /v1/audio/speech endpoint, powered by ElevenLabs, Google Gemini, and OpenAI speech models.

Want to hear the voices before writing code? The Audio Studio in the Playground generates speech from up to three models side by side, with per-model voice, format, and speed controls.

Available Models

Browse all speech generation models, with up-to-date pricing, on the models page.

Billing varies by model family. Some models are billed on token usage reported by the provider (input text tokens and output audio tokens), while others are billed on input character count (those return audio bytes without usage data). See the models page for each model's exact pricing.

Parameters

ParameterTypeDefaultDescription
modelstringrequiredThe speech model to use
inputstringrequiredThe text to synthesize into speech
voicestringmodelA prebuilt voice. Defaults to Kore (Gemini), alloy (OpenAI), or Sarah (ElevenLabs)
response_formatstringmodelAudio format. OpenAI: mp3 (default), opus, aac, flac, wav, pcm. ElevenLabs: mp3 (default), wav, pcm, opus. Gemini: wav (default), pcm
instructionsstringOptional style/delivery directive prepended to the input (e.g. "Say cheerfully")
speednumberAccepted for OpenAI compatibility, but not applied by Gemini speech models

Gemini speech models return raw PCM audio. LLMGateway wraps it in a WAV container by default (response_format: "wav"), or returns the raw 16-bit little-endian PCM at 24 kHz when response_format: "pcm" is requested. Other formats such as mp3 are only available on the OpenAI models, which return the audio already encoded in the requested format.

curl

curl -X POST "https://api.llmgateway.io/v1/audio/speech" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash-preview-tts",
    "input": "Hello, welcome to LLM Gateway!",
    "voice": "Kore"
  }' \
  --output speech.wav

OpenAI SDK

Works with the standard OpenAI client library — just point the base URL to LLMGateway.

import OpenAI from "openai";
import { writeFileSync } from "fs";

const openai = new OpenAI({
	apiKey: process.env.LLM_GATEWAY_API_KEY,
	baseURL: "https://api.llmgateway.io/v1",
});

const response = await openai.audio.speech.create({
	model: "gemini-2.5-flash-preview-tts",
	voice: "Kore",
	input: "Hello, welcome to LLM Gateway!",
});

const buffer = Buffer.from(await response.arrayBuffer());
writeFileSync("speech.wav", buffer);

Streaming

Streaming speech responses (chunked audio or stream_format: "sse") are not supported yet. The endpoint always returns the complete audio file in a single response, so there is no low-latency, play-as-you-go output for now.

Voices

Gemini exposes 30 prebuilt voices. A few common ones: Kore, Puck, Zephyr, Charon, Fenrir, Leda, Orus, Aoede. When voice is omitted on a Gemini model, Kore is used.

OpenAI voices include alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, and verse. When voice is omitted on an OpenAI model, alloy is used.

ElevenLabs models accept 20 named voices, including Sarah, Aria, Roger, Laura, Charlie, George, Charlotte, Jessica, Brian, and Lily. When voice is omitted on an ElevenLabs model, Sarah is used. A raw ElevenLabs voice id is also accepted directly.

ElevenLabs

The four ElevenLabs models are billed per input character (see the models page for rates):

  • eleven-multilingual-v2 — most lifelike, rich emotional expression, 29 languages
  • eleven-v3 — most expressive and human-like, 70+ languages
  • eleven-flash-v2-5 — ultra-low latency, 32 languages
  • eleven-turbo-v2-5 — fast and balanced, 32 languages
curl -X POST "https://api.llmgateway.io/v1/audio/speech" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "eleven-multilingual-v2",
    "input": "Hello, welcome to LLM Gateway!",
    "voice": "Sarah"
  }' \
  --output speech.mp3

How is this guide?

Last updated on

On this page

Ready for production?

Ship to production with SSO, audit logs, spend controls, and guardrails your security team will approve.

Explore Enterprise