Speech Generation
Generate speech (text-to-speech) using ElevenLabs, Gemini, and OpenAI models through the OpenAI-compatible audio API
Speech Generation
LLMGateway supports text-to-speech (TTS) through the OpenAI-compatible
/v1/audio/speech endpoint, powered by ElevenLabs, Google Gemini, and
OpenAI speech models.
Want to hear the voices before writing code? The Audio Studio in the Playground generates speech from up to three models side by side, with per-model voice, format, and speed controls.
Available Models
Browse all speech generation models, with up-to-date pricing, on the models page.
Billing varies by model family. Some models are billed on token usage reported by the provider (input text tokens and output audio tokens), while others are billed on input character count (those return audio bytes without usage data). See the models page for each model's exact pricing.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | required | The speech model to use |
input | string | required | The text to synthesize into speech |
voice | string | model | A prebuilt voice. Defaults to Kore (Gemini), alloy (OpenAI), or Sarah (ElevenLabs) |
response_format | string | model | Audio format. OpenAI: mp3 (default), opus, aac, flac, wav, pcm. ElevenLabs: mp3 (default), wav, pcm, opus. Gemini: wav (default), pcm |
instructions | string | — | Optional style/delivery directive prepended to the input (e.g. "Say cheerfully") |
speed | number | — | Accepted for OpenAI compatibility, but not applied by Gemini speech models |
Gemini speech models return raw PCM audio. LLMGateway wraps it in a WAV
container by default (response_format: "wav"), or returns the raw 16-bit
little-endian PCM at 24 kHz when response_format: "pcm" is requested.
Other formats such as mp3 are only available on the OpenAI models, which
return the audio already encoded in the requested format.
curl
curl -X POST "https://api.llmgateway.io/v1/audio/speech" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash-preview-tts",
"input": "Hello, welcome to LLM Gateway!",
"voice": "Kore"
}' \
--output speech.wavOpenAI SDK
Works with the standard OpenAI client library — just point the base URL to LLMGateway.
import OpenAI from "openai";
import { writeFileSync } from "fs";
const openai = new OpenAI({
apiKey: process.env.LLM_GATEWAY_API_KEY,
baseURL: "https://api.llmgateway.io/v1",
});
const response = await openai.audio.speech.create({
model: "gemini-2.5-flash-preview-tts",
voice: "Kore",
input: "Hello, welcome to LLM Gateway!",
});
const buffer = Buffer.from(await response.arrayBuffer());
writeFileSync("speech.wav", buffer);Streaming
Streaming speech responses (chunked audio or stream_format: "sse") are not
supported yet. The endpoint always returns the complete audio file in a single
response, so there is no low-latency, play-as-you-go output for now.
Voices
Gemini exposes 30 prebuilt voices. A few common ones:
Kore, Puck, Zephyr, Charon, Fenrir, Leda, Orus, Aoede. When
voice is omitted on a Gemini model, Kore is used.
OpenAI voices include alloy, ash, ballad, coral, echo, fable,
nova, onyx, sage, shimmer, and verse. When voice is omitted on an
OpenAI model, alloy is used.
ElevenLabs models accept 20 named voices, including Sarah, Aria, Roger,
Laura, Charlie, George, Charlotte, Jessica, Brian, and Lily. When
voice is omitted on an ElevenLabs model, Sarah is used. A raw ElevenLabs
voice id is also accepted directly.
ElevenLabs
The four ElevenLabs models are billed per input character (see the models page for rates):
eleven-multilingual-v2— most lifelike, rich emotional expression, 29 languageseleven-v3— most expressive and human-like, 70+ languageseleven-flash-v2-5— ultra-low latency, 32 languageseleven-turbo-v2-5— fast and balanced, 32 languages
curl -X POST "https://api.llmgateway.io/v1/audio/speech" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "eleven-multilingual-v2",
"input": "Hello, welcome to LLM Gateway!",
"voice": "Sarah"
}' \
--output speech.mp3How is this guide?
Last updated on