Skip to content

MiniMax Speech 2.8

MiniMax Speech 2.8 is available through the OpenAI-compatible text-to-speech endpoint. The current integration supports synchronous speech generation for short narration, dubbing, and voice output.

Endpoint

http
POST /v1/audio/speech

Supported Models

ModelDescription
minimax-speech-2.8-turboLow-latency speech synthesis model
minimax-speech-2.8-hdHigher-quality speech synthesis model

Request Parameters

ParameterTypeRequiredDescription
modelstringYesminimax-speech-2.8-turbo or minimax-speech-2.8-hd
inputstringYesText to synthesize
voicestringYesMiniMax voice ID, such as a system voice, cloned voice, or generated voice ID
response_formatstringNoAudio format. Default is mp3; common values include mp3, wav, flac, and pcm
speednumberNoSpeech speed, commonly 0.5 to 2; default is 1
metadataobjectNoMiniMax extension options

Basic Example

bash
curl -X POST "https://cubicspaces.cloud/v1/audio/speech" \
  -H "Authorization: Bearer $NEW_API_KEY" \
  -H "Content-Type: application/json" \
  --output minimax-speech.mp3 \
  -d '{
    "model": "minimax-speech-2.8-turbo",
    "voice": "male-qn-qingse",
    "input": "Hello, this is a MiniMax Speech 2.8 synchronous speech generation test.",
    "response_format": "mp3",
    "speed": 1
  }'

On success, the response body is binary audio. Use --output or an equivalent client option to save the file.

Native Options Example

To set sample rate, volume, emotion, language boost, pronunciation dictionaries, or other MiniMax-native options, send them in metadata:

json
{
  "model": "minimax-speech-2.8-hd",
  "voice": "male-qn-qingse",
  "input": "Welcome to Cubicspaces MiniMax speech synthesis.",
  "response_format": "mp3",
  "metadata": {
    "audio_setting": {
      "format": "mp3",
      "sample_rate": 32000,
      "bitrate": 128000,
      "channel": 1
    },
    "voice_setting": {
      "voice_id": "male-qn-qingse",
      "speed": 1,
      "vol": 1,
      "pitch": 0,
      "emotion": "happy"
    },
    "language_boost": "English",
    "subtitle_enable": false
  }
}

Common native fields:

FieldDescription
audio_setting.formatAudio format, such as mp3, pcm, flac, or wav
audio_setting.sample_rateSample rate, such as 8000, 16000, 22050, 24000, 32000, or 44100
audio_setting.bitrateMP3 bitrate, such as 32000, 64000, 128000, or 256000
voice_setting.voice_idVoice ID. For mixed voices, use it with timbre_weights
voice_setting.speedSpeech speed, commonly 0.5 to 2
voice_setting.volVolume, commonly (0, 10]
voice_setting.pitchPitch, commonly -12 to 12
voice_setting.emotionEmotion, such as happy, sad, angry, fearful, disgusted, surprised, calm, fluent, or whisper
voice_modifyVoice pitch, timbre, intensity, and sound effect controls
pronunciation_dictPronunciation dictionary
subtitle_enableWhether to request subtitle output. Actual behavior depends on model support

Notes

  • This page covers synchronous speech generation. The response body is the generated audio content.
  • metadata overrides extension fields with the same name. Do not change model inside metadata, to keep request behavior predictable.