GLM-5 / GLM-5.1

GLM-5 and GLM-5.1 are available through the OpenAI-compatible Chat Completions API. You do not need a Zhipu-specific protocol; set model to glm-5 or glm-5.1.

Endpoint

http

POST /v1/chat/completions

Supported Models

Model	Notes
`glm-5`	GLM-5 standard model
`glm-5.1`	GLM-5.1 standard model

Actual model availability depends on account permissions and platform configuration.

Request Example

bash

curl https://cubicspaces.cloud/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5",
    "messages": [
      { "role": "user", "content": "Reply with one short sentence." }
    ],
    "temperature": 0.2,
    "stream": false
  }'

Response Example

GLM models may return reasoning text in message.reasoning_content. Read the final answer from message.content.

json

{
  "id": "chatcmpl_123",
  "object": "chat.completion",
  "created": 1778309051,
  "model": "glm-5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": "The user asked for a short reply, so return a concise greeting.",
        "content": "Hello!"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 11,
    "completion_tokens": 174,
    "total_tokens": 185,
    "completion_tokens_details": {
      "reasoning_tokens": 171
    }
  }
}

Streaming

bash

curl https://cubicspaces.cloud/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1",
    "messages": [
      { "role": "user", "content": "Introduce GLM in one sentence." }
    ],
    "stream": true,
    "stream_options": {
      "include_usage": true
    }
  }'

Streaming responses use SSE. Each data: block may include:

choices[0].delta.reasoning_content: reasoning text chunk.
choices[0].delta.content: final answer chunk.
usage: usage details, usually at the end when stream_options.include_usage is true.

Common Parameters

Parameter	Type	Description
`model`	string	`glm-5` or `glm-5.1`
`messages`	array	OpenAI Chat message array
`temperature`	number	Sampling temperature
`top_p`	number	Nucleus sampling parameter
`max_tokens`	number	Maximum output tokens
`stream`	boolean	Whether to stream the response
`stream_options`	object	Streaming options such as `include_usage`
`tools`	array	Tool definitions
`tool_choice`	string/object	Tool selection strategy

SDK Example

PythonNode.js

python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://cubicspaces.cloud/v1"
)

response = client.chat.completions.create(
    model="glm-5",
    messages=[
        {"role": "user", "content": "Reply with one short sentence."}
    ],
    temperature=0.2
)

message = response.choices[0].message
print(message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://cubicspaces.cloud/v1"
});

const response = await client.chat.completions.create({
  model: "glm-5",
  messages: [{ role: "user", content: "Reply with one short sentence." }],
  temperature: 0.2
});

console.log(response.choices[0].message.content);

Notes

GLM uses the OpenAI-compatible /v1/chat/completions path.
For end-user answers, read choices[0].message.content.
If you want to show reasoning, read reasoning_content; otherwise you can ignore it.
For streaming, parse both delta.reasoning_content and delta.content.

GLM-5 / GLM-5.1 ​

Endpoint ​

Supported Models ​

Request Example ​

Response Example ​

Streaming ​

Common Parameters ​

SDK Example ​

Notes ​