MiniMax Speech 2.8

MiniMax Speech 2.8 is available through the OpenAI-compatible text-to-speech endpoint. The current integration supports synchronous speech generation for short narration, dubbing, and voice output.

Endpoint

http

POST /v1/audio/speech

Supported Models

Model	Description
`minimax-speech-2.8-turbo`	Low-latency speech synthesis model
`minimax-speech-2.8-hd`	Higher-quality speech synthesis model

Request Parameters

Parameter	Type	Required	Description
`model`	string	Yes	`minimax-speech-2.8-turbo` or `minimax-speech-2.8-hd`
`input`	string	Yes	Text to synthesize
`voice`	string	Yes	MiniMax voice ID, such as a system voice, cloned voice, or generated voice ID
`response_format`	string	No	Audio format. Default is `mp3`; common values include `mp3`, `wav`, `flac`, and `pcm`
`speed`	number	No	Speech speed, commonly `0.5` to `2`; default is `1`
`metadata`	object	No	MiniMax extension options

Basic Example

bash

curl -X POST "https://cubicspaces.cloud/v1/audio/speech" \
  -H "Authorization: Bearer $NEW_API_KEY" \
  -H "Content-Type: application/json" \
  --output minimax-speech.mp3 \
  -d '{
    "model": "minimax-speech-2.8-turbo",
    "voice": "male-qn-qingse",
    "input": "Hello, this is a MiniMax Speech 2.8 synchronous speech generation test.",
    "response_format": "mp3",
    "speed": 1
  }'

On success, the response body is binary audio. Use --output or an equivalent client option to save the file.

Native Options Example

To set sample rate, volume, emotion, language boost, pronunciation dictionaries, or other MiniMax-native options, send them in metadata:

json

{
  "model": "minimax-speech-2.8-hd",
  "voice": "male-qn-qingse",
  "input": "Welcome to Cubicspaces MiniMax speech synthesis.",
  "response_format": "mp3",
  "metadata": {
    "audio_setting": {
      "format": "mp3",
      "sample_rate": 32000,
      "bitrate": 128000,
      "channel": 1
    },
    "voice_setting": {
      "voice_id": "male-qn-qingse",
      "speed": 1,
      "vol": 1,
      "pitch": 0,
      "emotion": "happy"
    },
    "language_boost": "English",
    "subtitle_enable": false
  }
}

Common native fields:

Field	Description
`audio_setting.format`	Audio format, such as `mp3`, `pcm`, `flac`, or `wav`
`audio_setting.sample_rate`	Sample rate, such as `8000`, `16000`, `22050`, `24000`, `32000`, or `44100`
`audio_setting.bitrate`	MP3 bitrate, such as `32000`, `64000`, `128000`, or `256000`
`voice_setting.voice_id`	Voice ID. For mixed voices, use it with `timbre_weights`
`voice_setting.speed`	Speech speed, commonly `0.5` to `2`
`voice_setting.vol`	Volume, commonly `(0, 10]`
`voice_setting.pitch`	Pitch, commonly `-12` to `12`
`voice_setting.emotion`	Emotion, such as `happy`, `sad`, `angry`, `fearful`, `disgusted`, `surprised`, `calm`, `fluent`, or `whisper`
`voice_modify`	Voice pitch, timbre, intensity, and sound effect controls
`pronunciation_dict`	Pronunciation dictionary
`subtitle_enable`	Whether to request subtitle output. Actual behavior depends on model support

Notes

This page covers synchronous speech generation. The response body is the generated audio content.
metadata overrides extension fields with the same name. Do not change model inside metadata, to keep request behavior predictable.

MiniMax Speech 2.8 ​

Endpoint ​

Supported Models ​

Request Parameters ​

Basic Example ​

Native Options Example ​