MiniMax Speech 2.8
MiniMax Speech 2.8 is available through the OpenAI-compatible text-to-speech endpoint. The current integration supports synchronous speech generation for short narration, dubbing, and voice output.
Endpoint
http
POST /v1/audio/speechSupported Models
| Model | Description |
|---|---|
minimax-speech-2.8-turbo | Low-latency speech synthesis model |
minimax-speech-2.8-hd | Higher-quality speech synthesis model |
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | minimax-speech-2.8-turbo or minimax-speech-2.8-hd |
input | string | Yes | Text to synthesize |
voice | string | Yes | MiniMax voice ID, such as a system voice, cloned voice, or generated voice ID |
response_format | string | No | Audio format. Default is mp3; common values include mp3, wav, flac, and pcm |
speed | number | No | Speech speed, commonly 0.5 to 2; default is 1 |
metadata | object | No | MiniMax extension options |
Basic Example
bash
curl -X POST "https://cubicspaces.cloud/v1/audio/speech" \
-H "Authorization: Bearer $NEW_API_KEY" \
-H "Content-Type: application/json" \
--output minimax-speech.mp3 \
-d '{
"model": "minimax-speech-2.8-turbo",
"voice": "male-qn-qingse",
"input": "Hello, this is a MiniMax Speech 2.8 synchronous speech generation test.",
"response_format": "mp3",
"speed": 1
}'On success, the response body is binary audio. Use --output or an equivalent client option to save the file.
Native Options Example
To set sample rate, volume, emotion, language boost, pronunciation dictionaries, or other MiniMax-native options, send them in metadata:
json
{
"model": "minimax-speech-2.8-hd",
"voice": "male-qn-qingse",
"input": "Welcome to Cubicspaces MiniMax speech synthesis.",
"response_format": "mp3",
"metadata": {
"audio_setting": {
"format": "mp3",
"sample_rate": 32000,
"bitrate": 128000,
"channel": 1
},
"voice_setting": {
"voice_id": "male-qn-qingse",
"speed": 1,
"vol": 1,
"pitch": 0,
"emotion": "happy"
},
"language_boost": "English",
"subtitle_enable": false
}
}Common native fields:
| Field | Description |
|---|---|
audio_setting.format | Audio format, such as mp3, pcm, flac, or wav |
audio_setting.sample_rate | Sample rate, such as 8000, 16000, 22050, 24000, 32000, or 44100 |
audio_setting.bitrate | MP3 bitrate, such as 32000, 64000, 128000, or 256000 |
voice_setting.voice_id | Voice ID. For mixed voices, use it with timbre_weights |
voice_setting.speed | Speech speed, commonly 0.5 to 2 |
voice_setting.vol | Volume, commonly (0, 10] |
voice_setting.pitch | Pitch, commonly -12 to 12 |
voice_setting.emotion | Emotion, such as happy, sad, angry, fearful, disgusted, surprised, calm, fluent, or whisper |
voice_modify | Voice pitch, timbre, intensity, and sound effect controls |
pronunciation_dict | Pronunciation dictionary |
subtitle_enable | Whether to request subtitle output. Actual behavior depends on model support |
Notes
- This page covers synchronous speech generation. The response body is the generated audio content.
metadataoverrides extension fields with the same name. Do not changemodelinsidemetadata, to keep request behavior predictable.