Skip to content

GLM-5 / GLM-5.1

GLM-5 and GLM-5.1 are available through the OpenAI-compatible Chat Completions API. You do not need a Zhipu-specific protocol; set model to glm-5 or glm-5.1.

Endpoint

http
POST /v1/chat/completions

Supported Models

ModelNotes
glm-5GLM-5 standard model
glm-5.1GLM-5.1 standard model

Actual model availability depends on account permissions and platform configuration.

Request Example

bash
curl https://cubicspaces.cloud/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5",
    "messages": [
      { "role": "user", "content": "Reply with one short sentence." }
    ],
    "temperature": 0.2,
    "stream": false
  }'

Response Example

GLM models may return reasoning text in message.reasoning_content. Read the final answer from message.content.

json
{
  "id": "chatcmpl_123",
  "object": "chat.completion",
  "created": 1778309051,
  "model": "glm-5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": "The user asked for a short reply, so return a concise greeting.",
        "content": "Hello!"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 11,
    "completion_tokens": 174,
    "total_tokens": 185,
    "completion_tokens_details": {
      "reasoning_tokens": 171
    }
  }
}

Streaming

bash
curl https://cubicspaces.cloud/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1",
    "messages": [
      { "role": "user", "content": "Introduce GLM in one sentence." }
    ],
    "stream": true,
    "stream_options": {
      "include_usage": true
    }
  }'

Streaming responses use SSE. Each data: block may include:

  • choices[0].delta.reasoning_content: reasoning text chunk.
  • choices[0].delta.content: final answer chunk.
  • usage: usage details, usually at the end when stream_options.include_usage is true.

Common Parameters

ParameterTypeDescription
modelstringglm-5 or glm-5.1
messagesarrayOpenAI Chat message array
temperaturenumberSampling temperature
top_pnumberNucleus sampling parameter
max_tokensnumberMaximum output tokens
streambooleanWhether to stream the response
stream_optionsobjectStreaming options such as include_usage
toolsarrayTool definitions
tool_choicestring/objectTool selection strategy

SDK Example

python
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://cubicspaces.cloud/v1"
)

response = client.chat.completions.create(
    model="glm-5",
    messages=[
        {"role": "user", "content": "Reply with one short sentence."}
    ],
    temperature=0.2
)

message = response.choices[0].message
print(message.content)
js
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://cubicspaces.cloud/v1"
});

const response = await client.chat.completions.create({
  model: "glm-5",
  messages: [{ role: "user", content: "Reply with one short sentence." }],
  temperature: 0.2
});

console.log(response.choices[0].message.content);

Notes

  • GLM uses the OpenAI-compatible /v1/chat/completions path.
  • For end-user answers, read choices[0].message.content.
  • If you want to show reasoning, read reasoning_content; otherwise you can ignore it.
  • For streaming, parse both delta.reasoning_content and delta.content.