Z.ai API Reference

Complete API documentation for Z.ai (Zhipu AI) Open Platform.

Getting Started
Authentication
API Endpoints
Models
Chat Completions API
Special Features
Error Handling
SDKs
Rate Limits

Getting Started

API Endpoint

Z.ai Platform's general API endpoint:

https://api.z.ai/api/paas/v4

For GLM Coding Plan subscribers, use the dedicated coding endpoint:

https://api.z.ai/api/coding/paas/v4

Quick Start

Get API Key: Visit Z.AI Open Platform to register and create an API key
Choose Model: Select from available models based on your needs
Make API Call: Use HTTP API or SDK to make requests

Authentication

Z.ai API uses HTTP Bearer Token authentication.

API Key Authentication

Authorization: Bearer YOUR_API_KEY

JWT Token Authentication (Optional)

For higher security scenarios, generate JWT tokens:

import jwt
import time

def generate_token(apikey: str, exp_seconds: int):
    id, secret = apikey.split(".")
    payload = {
        "api_key": id,
        "exp": int(round(time.time() * 1000)) + exp_seconds * 1000,
        "timestamp": int(round(time.time() * 1000)),
    }
    return jwt.encode(
        payload,
        secret,
        algorithm="HS256",
        headers={"alg": "HS256", "sign_type": "SIGN"}
    )

Required Headers

Content-Type: application/json
Accept-Language: en-US,en
Authorization: Bearer YOUR_API_KEY

API Endpoints

Endpoint	Method	Description
`/chat/completions`	POST	Generate chat completions
`/embeddings`	POST	Generate text embeddings
`/images/generations`	POST	Generate images
`/videos/generations`	POST	Generate videos
`/files`	POST	Upload files
`/files/{file_id}`	GET	Retrieve file
`/files/{file_id}`	DELETE	Delete file
`/audio/transcriptions`	POST	Audio-to-text transcription
`/moderations`	POST	Content moderation

Models

Language Models (LLM)

Model	Context Length	Description
`glm-4.7`	128K	Latest flagship model with enhanced programming and multi-step reasoning
`glm-4.6`	200K	Superior coding, long-context, and reasoning capabilities
`glm-4.5`	128K	Balanced performance across domains
`glm-4-32b-0414-128k`	128K	Large parameter model with extended context

Vision Language Models (VLM)

Model	Context Length	Description
`glm-4.6v`	128K	Flagship multimodal model with native tool use
`glm-4.6v-flashx`	128K	Lightweight, high-speed vision processing
`glm-4.6v-flash`	128K	Lightweight, free basic vision tasks
`glm-ocr`	-	Specialized OCR for text extraction

Image Generation Models

Model	Description
`glm-image`	General image generation
`cogview-4`	High-quality visual generation

Video Generation Models

Model	Description
`cogvideox-3`	Latest video generation
`vidu-q1`	Fast, efficient video creation
`vidu-2`	Enhanced quality and features

Audio Models

Model	Description
`glm-asr-2512`	Speech recognition / Audio-to-text

Chat Completions API

Endpoint

POST /chat/completions

Request Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model ID to use (e.g., `glm-4.7`)
`messages`	array	Yes	Array of message objects
`temperature`	number	No	Sampling temperature (0.0-2.0), default: 1.0
`max_tokens`	integer	No	Maximum tokens to generate (1-32768)
`top_p`	number	No	Nucleus sampling threshold (0.0-1.0)
`stream`	boolean	No	Enable streaming responses, default: false
`thinking`	object	No	Configure thinking mode
`tools`	array	No	Array of tool definitions
`tool_choice`	string	No	Tool use strategy: `auto`, `none`, `required`
`response_format`	object	No	Structured output format
`stop`	string/array	No	Stop sequences
`presence_penalty`	number	No	Presence penalty (-2.0 to 2.0)
`frequency_penalty`	number	No	Frequency penalty (-2.0 to 2.0)

Message Object

{
  "role": "system|user|assistant",
  "content": "string or array"
}

For vision models, content can be an array:

{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "Describe this image"
    },
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/image.jpg"
      }
    }
  ]
}

Tool Definition

{
  "type": "function",
  "function": {
    "name": "function_name",
    "description": "Function description",
    "parameters": {
      "type": "object",
      "properties": {
        "param_name": {
          "type": "string",
          "description": "Parameter description"
        }
      },
      "required": ["param_name"]
    }
  }
}

Thinking Configuration

{
  "thinking": {
    "type": "enabled"
  }
}

Response Schema (Non-Streaming)

{
  "id": "chatcmpl-xxx",
  "object": "chat.completion",
  "created": 1629900000,
  "model": "glm-4.7",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Response text"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 200,
    "total_tokens": 300,
    "prompt_cache_hit_tokens": 0,
    "prompt_cache_miss_tokens": 100
  }
}

Response Schema (Streaming)

{
  "id": "chatcmpl-xxx",
  "object": "chat.completion.chunk",
  "created": 1629900000,
  "model": "glm-4.7",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "Partial response"
      },
      "finish_reason": null
    }
  ]
}

Finish Reasons

Value	Description
`stop`	Model completed normally
`length`	Max tokens reached
`tool_calls`	Model requested tool call
`content_filter`	Content was filtered

Special Features

Deep Thinking

Enable the AI's reasoning process to be shown:

{
  "model": "glm-4.7",
  "messages": [...],
  "thinking": {
    "type": "enabled"
  }
}

Response includes reasoning_content field:

{
  "choices": [{
    "message": {
      "content": "Final answer",
      "reasoning_content": "Step-by-step thinking process"
    }
  }]
}

Function Calling / Tools

Define tools for the model to call:

{
  "model": "glm-4.7",
  "messages": [...],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City name, e.g. San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Streaming with Tool Calls

{
  "model": "glm-4.7",
  "messages": [...],
  "tools": [...],
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}

Structured Output

Request JSON-formatted responses:

{
  "model": "glm-4.7",
  "messages": [...],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "analysis",
      "schema": {
        "type": "object",
        "properties": {
          "sentiment": {
            "type": "string",
            "enum": ["positive", "negative", "neutral"]
          },
          "score": {
            "type": "number",
            "minimum": 0,
            "maximum": 1
          }
        },
        "required": ["sentiment", "score"],
        "additionalProperties": false
      }
    }
  }
}

Context Caching

Reduce token usage for repeated content:

{
  "model": "glm-4.7",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant.",
      "cache_control": {"type": "enabled"}
    },
    ...
  ]
}

Error Handling

Error Response Format

{
  "error": {
    "message": "Invalid authentication",
    "type": "invalid_request_error",
    "param": null,
    "code": "invalid_api_key"
  }
}

HTTP Status Codes

Code	Description
200	Success
400	Bad Request - Invalid parameters
401	Unauthorized - Invalid API key
403	Forbidden - Insufficient permissions
404	Not Found
429	Rate Limit Exceeded
500	Internal Server Error
503	Service Unavailable

Error Types

Type	Description
`invalid_request_error`	Invalid request parameters
`invalid_api_key`	Invalid or expired API key
`rate_limit_error`	Rate limit exceeded
`insufficient_quota`	Insufficient account quota
`content_filter`	Content was filtered

SDKs

Python SDK (Official)

from zai import ZaiClient

client = ZaiClient(api_key="your-api-key")

# Basic chat
response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

# Streaming
for chunk in client.chat.completions.create(
    model="glm-4.7",
    messages=[...],
    stream=True
):
    print(chunk.choices[0].delta.content, end="")

Python (OpenAI Compatible)

from openai import OpenAI

client = OpenAI(
    api_key="your-zai-api-key",
    base_url="https://api.z.ai/api/paas/v4/"
)

completion = client.chat.completions.create(
    model="glm-4.7",
    messages=[{"role": "user", "content": "Hello!"}]
)

Java SDK

import ai.z.openapi.ZaiClient;
import ai.z.openapi.service.model.*;

ZaiClient client = ZaiClient.builder()
    .ofZAI()
    .apiKey("your-api-key")
    .build();

ChatCompletionCreateParams request = ChatCompletionCreateParams.builder()
    .model("glm-4.7")
    .messages(Arrays.asList(
        ChatMessage.builder()
            .role(ChatMessageRole.USER.value())
            .content("Hello!")
            .build()
    ))
    .build();

ChatCompletionResponse response = client.chat().createChatCompletion(request);

Node.js (OpenAI Compatible)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'your-zai-api-key',
  baseURL: 'https://api.z.ai/api/paas/v4/'
});

const completion = await client.chat.completions.create({
  model: 'glm-4.7',
  messages: [{ role: 'user', content: 'Hello!' }]
});

cURL Examples

# Basic request
curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4.7",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

# Streaming
curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4.7",
    "messages": [...],
    "stream": true
  }'

# With vision
curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4.6v",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is in this image?"},
          {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
      }
    ]
  }'

Rate Limits

Rate limits are based on your account tier and subscription.

Plan	Requests per Minute	Concurrent Requests
Free	60	3
GLM Coding	Higher limits	Higher limits
Enterprise	Custom	Custom

Rate limit headers are included in responses:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1629900000

Handling Rate Limits

Implement exponential backoff for 429 errors:

import time

def make_request_with_retry(client, request, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(**request)
        except RateLimitError as e:
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt
                time.sleep(wait_time)
            else:
                raise

Best Practices

Token Management

Monitor token usage with the usage field in responses
Use max_tokens to control response length
Implement context caching for repeated content
Choose appropriate model for your use case

Error Handling

Implement retry logic for 429 errors with exponential backoff
Validate API keys before making requests
Handle streaming interruptions gracefully
Log errors for debugging

Performance Optimization

Use streaming for real-time applications
Use temperature=0 for deterministic responses
Implement request batching where possible
Use appropriate model for the task (don't over-provision)

Security

Store API keys securely (environment variables)
Use HTTPS for all API calls
Implement proper access controls
Rotate API keys regularly
Never commit API keys to version control

Resources

Official Documentation: https://docs.z.ai/
Developer Platform: https://open.bigmodel.cn/
GitHub: https://github.com/zai-org
Python SDK: https://github.com/zai-org/z-ai-sdk-python
Java SDK: https://github.com/zai-org/z-ai-sdk-java

Changelog

2026

GLM-4.7-Flash released - Open source, free 30B parameter model
GLM-4.7 released - Latest flagship with enhanced coding

2025

GLM-4.6 released - 200K context, agent-oriented
GLM-4.6V released - Flagship multimodal
CogVideoX-3 released - Latest video generation
GLM Coding Plan launched - Specialized coding endpoint

Document generated based on Z.ai API documentation as of February 2026.

FilesExpand file tree

zai-api-reference.md

Latest commit

History

zai-api-reference.md

File metadata and controls

Z.ai API Reference

Table of Contents

Getting Started

API Endpoint

Quick Start

Authentication

API Key Authentication

JWT Token Authentication (Optional)

Required Headers

API Endpoints

Models

Language Models (LLM)

Vision Language Models (VLM)

Image Generation Models

Video Generation Models

Audio Models

Chat Completions API

Endpoint

Request Parameters

Message Object

Tool Definition

Thinking Configuration

Response Schema (Non-Streaming)

Response Schema (Streaming)

Finish Reasons

Special Features

Deep Thinking

Function Calling / Tools

Streaming with Tool Calls

Structured Output

Context Caching

Error Handling

Error Response Format

HTTP Status Codes

Error Types

SDKs

Python SDK (Official)

Python (OpenAI Compatible)

Java SDK

Node.js (OpenAI Compatible)

cURL Examples

Rate Limits

Handling Rate Limits

Best Practices

Token Management

Error Handling

Performance Optimization

Security

Resources

Changelog

2026

2025