GLM-5 / GLM-5.1
GLM-5 and GLM-5.1 are available through the OpenAI-compatible Chat Completions API. You do not need a Zhipu-specific protocol; set model to glm-5 or glm-5.1.
Endpoint
http
POST /v1/chat/completionsSupported Models
| Model | Notes |
|---|---|
glm-5 | GLM-5 standard model |
glm-5.1 | GLM-5.1 standard model |
Actual model availability depends on account permissions and platform configuration.
Request Example
bash
curl https://cubicspaces.cloud/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-5",
"messages": [
{ "role": "user", "content": "Reply with one short sentence." }
],
"temperature": 0.2,
"stream": false
}'Response Example
GLM models may return reasoning text in message.reasoning_content. Read the final answer from message.content.
json
{
"id": "chatcmpl_123",
"object": "chat.completion",
"created": 1778309051,
"model": "glm-5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"reasoning_content": "The user asked for a short reply, so return a concise greeting.",
"content": "Hello!"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 11,
"completion_tokens": 174,
"total_tokens": 185,
"completion_tokens_details": {
"reasoning_tokens": 171
}
}
}Streaming
bash
curl https://cubicspaces.cloud/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-5.1",
"messages": [
{ "role": "user", "content": "Introduce GLM in one sentence." }
],
"stream": true,
"stream_options": {
"include_usage": true
}
}'Streaming responses use SSE. Each data: block may include:
choices[0].delta.reasoning_content: reasoning text chunk.choices[0].delta.content: final answer chunk.usage: usage details, usually at the end whenstream_options.include_usageistrue.
Common Parameters
| Parameter | Type | Description |
|---|---|---|
model | string | glm-5 or glm-5.1 |
messages | array | OpenAI Chat message array |
temperature | number | Sampling temperature |
top_p | number | Nucleus sampling parameter |
max_tokens | number | Maximum output tokens |
stream | boolean | Whether to stream the response |
stream_options | object | Streaming options such as include_usage |
tools | array | Tool definitions |
tool_choice | string/object | Tool selection strategy |
SDK Example
python
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://cubicspaces.cloud/v1"
)
response = client.chat.completions.create(
model="glm-5",
messages=[
{"role": "user", "content": "Reply with one short sentence."}
],
temperature=0.2
)
message = response.choices[0].message
print(message.content)js
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://cubicspaces.cloud/v1"
});
const response = await client.chat.completions.create({
model: "glm-5",
messages: [{ role: "user", content: "Reply with one short sentence." }],
temperature: 0.2
});
console.log(response.choices[0].message.content);Notes
- GLM uses the OpenAI-compatible
/v1/chat/completionspath. - For end-user answers, read
choices[0].message.content. - If you want to show reasoning, read
reasoning_content; otherwise you can ignore it. - For streaming, parse both
delta.reasoning_contentanddelta.content.