Text-to-Speech

Convert text to natural-sounding speech. Drop-in replacement for POST /v1/audio/speech with automatic fallback across OpenAI and SiliconFlow TTS models.

POST/v1/audio/speech

Request body

NameTypeRequiredDescription
model
stringRequiredStandard model alias. Must declare speech capability. See supported models below.
input
stringRequiredThe text to convert. Maximum 4,096 characters for OpenAI models; SiliconFlow models accept up to 1,000 bytes.
voice
stringRequiredVoice to use: alloy, echo, fable, onyx, nova, shimmer. SiliconFlow models map unknown voices to their default.
response_format
stringOutput format: mp3 (default), opus, aac, flac, wav, pcm.
speed
numberSpeaking speed multiplier, 0.25 to 4.0. Default 1.0. Supported by OpenAI models only.

Response

Returns audio bytes with the appropriate Content-Type header (e.g. audio/mpeg for mp3). Status 200 on success.

Quick start

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.therouter.ai",
    api_key="YOUR_API_KEY",
)

with client.audio.speech.with_streaming_response.create(
    model="openai/tts-1",
    voice="alloy",
    input="Hello from TheRouter.ai!",
) as response:
    response.stream_to_file("output.mp3")

CJK / Chinese text

For Chinese and other CJK text, use alibaba/cosyvoice2-0.5b (SiliconFlow). It is billed per UTF-8 byte rather than per character, making CJK billing proportional to actual data size. The fallback from openai/tts-1 to CosyVoice2 is automatic when the OpenAI provider is unavailable.

python
# SiliconFlow CosyVoice2 — optimised for Chinese text
# Billed per UTF-8 byte (3 bytes per CJK character)
response = client.audio.speech.create(
    model="alibaba/cosyvoice2-0.5b",
    voice="alloy",
    input="你好,欢迎使用 TheRouter!",
)

Supported models

NameTypeRequiredDescription
openai/tts-1
$15/1M charsOpenAI TTS-1. Low-latency, optimised for speed. Automatic fallback to CosyVoice2 (SiliconFlow) at priority 1.
openai/tts-1-hd
$30/1M charsOpenAI TTS-1 HD. Higher fidelity, slower generation. No SiliconFlow fallback.
openai/gpt-4o-mini-tts
$12/1M tokensGPT-4o Mini TTS. Supports natural-language instructions via the instructions parameter.
alibaba/cosyvoice2-0.5b
$7.15/1M bytesSiliconFlow CosyVoice2 0.5B. CJK-optimised, byte-based billing. Also serves as fallback for openai/tts-1.
fishaudio/fish-speech-1.5
$7.15/1M bytesFish Audio Fish-Speech 1.5. Expressive multilingual TTS with voice cloning support.
indexteam/indextts-2
$7.15/1M bytesIndexTTS-2. Cloning-grade voice fidelity.
The sync endpoint returns audio inline. For long-form or background TTS, use POST /v1/jobs with an audio model — the job is queued, the output stored in S3, and a presigned URL returned when complete.