Audio API

Drop-in replacement for the OpenAI Audio API. Route TTS and STT requests across OpenAI and SiliconFlow with automatic fallback, unified billing, and consistent response shapes.

Endpoints

NameTypeRequiredDescription
POST /v1/audio/speech
syncText-to-speech. Returns audio bytes inline. Drop-in for OpenAI /v1/audio/speech.
POST /v1/audio/transcriptions
syncSpeech-to-text. Returns JSON transcript inline. Drop-in for OpenAI /v1/audio/transcriptions.
POST /v1/jobs (model=audio)
asyncAsync audio job. Returns 202 with job ID. Poll GET /v1/jobs/:id. Use for long inputs or background processing.

Text-to-Speech models

All models accept the OpenAI voice parameter (alloy, echo, fable, onyx, nova, shimmer). SiliconFlow models map unknown voice names to their default voice.

NameTypeRequiredDescription
openai/tts-1
ttsOpenAI TTS-1. Optimised for speed. Billed per character ($15/1M chars). Falls back to SiliconFlow CosyVoice2 at priority 1.
openai/tts-1-hd
ttsOpenAI TTS-1 HD. Optimised for quality. Billed per character ($30/1M chars).
openai/gpt-4o-mini-tts
ttsGPT-4o Mini TTS. Instruction-following voice model. Billed per token + request.
alibaba/cosyvoice2-0.5b
ttsSiliconFlow CosyVoice2 0.5B. CJK-optimised TTS. Billed per UTF-8 byte ($7.15/1M bytes). Best choice for Chinese text.
fishaudio/fish-speech-1.5
ttsFish Audio Fish-Speech 1.5. Highly expressive multilingual TTS. $7.15/1M UTF-8 bytes.
indexteam/indextts-2
ttsIndexTTS-2. Cloning-grade voice fidelity. $7.15/1M UTF-8 bytes.

Speech-to-Text models

NameTypeRequiredDescription
openai/whisper-1
sttOpenAI Whisper-1. General-purpose multilingual transcription. $0.006/min. Falls back to SiliconFlow SenseVoice at priority 1.
alibaba/sensevoice-small
sttSiliconFlow SenseVoice Small. Fast, emotion-aware transcription optimised for Chinese and English. $0.006/min.

Sync vs async

The sync endpoints (/v1/audio/speech and /v1/audio/transcriptions) stream the result back inline β€” best for interactive or short inputs. For long audio files or batch workloads, use POST /v1/jobs with an audio model: the gateway queues the request, stores the output in S3, and returns a presigned URL when complete.

Both sync endpoints are drop-in compatible with the OpenAI SDK. No code changes required β€” just swap the base URL to https://api.therouter.ai.

Billing

NameTypeRequiredDescription
OpenAI TTS
per charactertts-1: $15/1M chars. tts-1-hd: $30/1M chars.
SiliconFlow TTS
per UTF-8 byteCosyVoice2, Fish-Speech, IndexTTS-2: $7.15/1M UTF-8 bytes. CJK text bills at 3Γ— ASCII char-count.
OpenAI STT
per minutewhisper-1: $0.006/min.
SiliconFlow STT
per minuteSenseVoice-Small: $0.006/min.