Text-to-Speech

Convert text to natural-sounding speech. Drop-in replacement for POST /v1/audio/speech with automatic fallback across OpenAI and SiliconFlow TTS models.

POST/v1/audio/speech

Request body

Name	Type	Required	Description
model	string	Required	Standard model alias. Must declare speech capability. See supported models below.
input	string	Required	The text to convert. Maximum 4,096 characters for OpenAI models; SiliconFlow models accept up to 1,000 bytes.
voice	string	Required	Voice to use: alloy, echo, fable, onyx, nova, shimmer. SiliconFlow models map unknown voices to their default.
response_format	string		Output format: mp3 (default), opus, aac, flac, wav, pcm.
speed	number		Speaking speed multiplier, 0.25 to 4.0. Default 1.0. Supported by OpenAI models only.

Response

Returns audio bytes with the appropriate Content-Type header (e.g. audio/mpeg for mp3). Status 200 on success.

Quick start

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.therouter.ai",
    api_key="YOUR_API_KEY",
)

with client.audio.speech.with_streaming_response.create(
    model="openai/tts-1",
    voice="alloy",
    input="Hello from TheRouter.ai!",
) as response:
    response.stream_to_file("output.mp3")

CJK / Chinese text

For Chinese and other CJK text, use alibaba/cosyvoice2-0.5b (SiliconFlow). It is billed per UTF-8 byte rather than per character, making CJK billing proportional to actual data size. The fallback from openai/tts-1 to CosyVoice2 is automatic when the OpenAI provider is unavailable.

python

# SiliconFlow CosyVoice2 — optimised for Chinese text
# Billed per UTF-8 byte (3 bytes per CJK character)
response = client.audio.speech.create(
    model="alibaba/cosyvoice2-0.5b",
    voice="alloy",
    input="你好，欢迎使用 TheRouter！",
)

Supported models

Name	Type	Description
openai/tts-1	$15/1M chars	OpenAI TTS-1. Low-latency, optimised for speed. Automatic fallback to CosyVoice2 (SiliconFlow) at priority 1.
openai/tts-1-hd	$30/1M chars	OpenAI TTS-1 HD. Higher fidelity, slower generation. No SiliconFlow fallback.
openai/gpt-4o-mini-tts	$12/1M tokens	GPT-4o Mini TTS. Supports natural-language instructions via the instructions parameter.
alibaba/cosyvoice2-0.5b	$7.15/1M bytes	SiliconFlow CosyVoice2 0.5B. CJK-optimised, byte-based billing. Also serves as fallback for openai/tts-1.
fishaudio/fish-speech-1.5	$7.15/1M bytes	Fish Audio Fish-Speech 1.5. Expressive multilingual TTS with voice cloning support.
indexteam/indextts-2	$7.15/1M bytes	IndexTTS-2. Cloning-grade voice fidelity.

The sync endpoint returns audio inline. For long-form or background TTS, use POST /v1/jobs with an audio model — the job is queued, the output stored in S3, and a presigned URL returned when complete.

Audio

Speech-to-Text