Use GLM-5.2 in Claude Code and Cut Costs by Up to 50%

GLM-5.2 is an excellent coding model. Claude Code is an excellent coding-agent harness.

You can combine them.

This guide shows three practical ways to run GLM-5.2 inside Claude Code:

OpenRouter, which is the best starting point if you want one routing layer and cost visibility.
Z.AI direct, which is the cleanest provider-native setup.
A local or internal proxy, which is the advanced setup for teams that need custom routing.

The cost case is straightforward. GLM-5.2 can be materially cheaper for coding-agent work, especially when the task does not require the most expensive default model. Optimize for cost per accepted coding outcome, not raw token volume.

Brian Armstrong post about keeping AI spend flat while token usage grows through better defaults, routing, and caching

Key Findings

Finding	What To Do
Start with OpenRouter if you want routing and cost visibility.	Use OpenRouter as the API layer, set the model to `z-ai/glm-5.2`, then test Claude Code tool behavior.
Use Z.AI direct if you want the cleanest GLM-native setup.	Point Claude Code at Z.AI's Anthropic-compatible endpoint and use the GLM-5.2 Claude Code defaults.
Use a proxy only if you need control.	Put a translation layer between Claude Code and OpenAI-compatible endpoints.
Tool calling is the main caveat.	Verify search, edit, bash, git, tests, and long streaming responses before team rollout.
Measure cost per finished task.	Count retries, failed tool calls, review rewrite, and abandoned sessions.

Evidence and Methodology

OpenRouter lists the GLM-5.2 model ID as:

z-ai/glm-5.2

OpenRouter also lists GLM-5.2 with a 1M-token context window and pricing of $0.95 / $3 per 1M input/output tokens at the time of writing. That makes it a serious candidate for cost-sensitive coding-agent work.

List price is only the starting point. In a public GLM vs Opus cost thread, Sridhar Ramaswamy described a benchmark where GLM used about twice the tokens but still cost roughly half as much.

Sridhar Ramaswamy post comparing GLM and Opus token usage and cost

More tokens can still cost less.

For coding agents, measure the full session:

Metric	Why It Matters
Cost per accepted task	The session produced work the engineer kept.
Retry-adjusted cost	Cheap failed attempts are still waste.
Tool-call success	Claude Code depends on tool reliability.
Review-adjusted cost	Low-quality diffs push cost onto reviewers.
Rework-adjusted cost	Code that gets rewritten was not cheap.

Concrete Operator Scenario

An engineering team already uses Claude Code every day. Engineers like the harness and want to keep it. Finance sees AI spend rising and asks whether there is a cheaper default for some classes of work.

The team keeps Claude Code and changes the model route.

They test GLM-5.2 for:

Work Type	Fit
Repo exploration	Strong fit. Long context helps.
Routine implementation	Strong fit after verification.
Test generation	Good fit with review discipline.
Refactors	Good fit when tests are solid.
Security-sensitive code	Use carefully. Keep stricter review.
High-risk architecture	Compare against the strongest model before switching.

This is the operating model: keep Claude Code, route suitable work to GLM-5.2, and keep premium models for tasks where they earn their cost.

Measurement Approach

Run a small benchmark before changing defaults.

Use the same repository, prompt, acceptance criteria, and test commands across your current Claude Code model and GLM-5.2.

Track:

Measurement	Pass Criteria
Setup works	Claude Code starts and reports the expected model route.
Search works	It can inspect files and find relevant code.
Edit works	It can produce a clean patch.
Bash works	It can run commands and handle failures.
Tests work	It can run tests and fix failures.
Cost improves	Total session cost is lower after retries.
Review quality holds	The diff is not creating extra human cleanup.

Treat lower model pricing as a hypothesis. The benchmark passes when cost per accepted task drops.

Path 1: OpenRouter

Use this path first if your team wants one API layer for model routing, usage visibility, and spend controls.

Create an OpenRouter API key, then run Claude Code with OpenRouter as the base URL:

export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="$OPENROUTER_API_KEY"
export ANTHROPIC_MODEL="z-ai/glm-5.2"
claude

For persistent setup, add it to ~/.claude/settings.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://openrouter.ai/api",
    "ANTHROPIC_AUTH_TOKEN": "YOUR_OPENROUTER_API_KEY",
    "ANTHROPIC_MODEL": "z-ai/glm-5.2"
  }
}

Then restart Claude Code and run:

/status

Confirm that Claude Code is using the expected route. Then run a real coding task.

OpenRouter caveat: its Claude Code documentation says Claude Code support through OpenRouter is guaranteed only with Anthropic first-party providers. GLM-5.2 may work through OpenRouter, but you should treat it as a compatibility test until your own file-editing, bash, test, and tool-calling checks pass.

Use OpenRouter when you want:

centralized cost reporting,
one key across multiple providers,
budget controls,
model comparison,
routing and fallback.

Use it for team-wide Claude Code defaults only after tool calling is verified.

Path 2: Z.AI Direct

Use this path if you want the cleanest GLM-native setup.

Z.AI documents an Anthropic-compatible Claude Code endpoint:

https://api.z.ai/api/anthropic

Add this to ~/.claude/settings.json:

{
  "env": {
    "ANTHROPIC_AUTH_TOKEN": "YOUR_ZAI_API_KEY",
    "ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
    "API_TIMEOUT_MS": "3000000",
    "CLAUDE_CODE_AUTO_COMPACT_WINDOW": "1000000",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-5.2[1m]",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5.2[1m]"
  }
}

Restart Claude Code:

claude

Check:

/status

This is the safest GLM-specific path because it uses an Anthropic-compatible endpoint instead of forcing Claude Code through an OpenAI-compatible API.

Path 3: Local or Internal Proxy

Use this path if your team already runs an AI gateway or needs custom routing.

Claude Code expects Anthropic-style behavior. Some GLM endpoints and gateways expose OpenAI-compatible Chat Completions instead. Put a proxy between Claude Code and an OpenAI-compatible endpoint so the request and tool-call formats are translated correctly.

The shape is:

Claude Code
  -> Anthropic-compatible proxy
  -> OpenAI-compatible GLM endpoint
  -> GLM-5.2

Use this setup when you need:

model aliases,
team-level attribution,
internal logging,
policy controls,
caching,
retries,
fallback routing.

This is the most flexible option and the easiest one to break. Test it carefully.

Caveats And Failure Modes

Most failures come from the adapter layer.

Caveat	What Can Go Wrong	Fix
Tool schemas	Claude Code tool calls may not survive translation.	Prefer Anthropic-compatible routes and test real tool use.
Streaming	Long edits can break if the gateway buffers or reshapes output.	Test large diffs and failing-test loops.
Context	1M context is useful but can increase spend.	Use large context for repo-scale tasks, not every small change.
Retries	Failed attempts erase savings.	Track retry-adjusted cost.
Review burden	Cheap code can become expensive in review.	Track accepted diffs, not generated diffs.
Auth precedence	Existing Claude Code auth can confuse routing.	Check `/status`, clear conflicting env vars, restart Claude Code.

The practical rule: if search, edit, bash, git, tests, and long streaming responses work, the setup is worth benchmarking. If any of those fail, fix the route before judging GLM-5.2.

What To Do Next

Start with OpenRouter if you care about routing and visibility. Start with Z.AI direct if you want the cleanest GLM-specific setup.

Run this five-task test:

Ask Claude Code to inspect a real repo and explain the relevant files.
Ask it to make a small implementation change.
Ask it to update or add tests.
Ask it to run tests and fix failures.
Ask it to summarize the final diff and risks.

Compare GLM-5.2 against your current default model on:

total session cost,
retry count,
latency,
tool-call success,
test success,
review rewrite,
accepted outcome rate.

If GLM-5.2 cuts cost without increasing retries or review burden, make it the default for those task classes. Keep stronger models for work that needs them.

FAQ

Can I use GLM-5.2 inside Claude Code?

Yes. Use OpenRouter, Z.AI's Anthropic-compatible endpoint, or a proxy that translates Claude Code's Anthropic-style requests.

Which setup should I try first?

Try OpenRouter first if you want routing and cost visibility. Try Z.AI direct first if you want the simplest GLM-specific route.

What is the OpenRouter model ID?

Use z-ai/glm-5.2.

What is the biggest caveat?

Tool calling. Verify file search, edits, bash commands, tests, git, and long streaming responses before using the setup for serious work.

Can this really cut costs by up to 50%?

Yes, for the right task classes. The savings depend on model pricing, retries, context size, cache behavior, and review quality. Measure cost per accepted coding task.

How to Use GLM-5.2 in Claude Code and Cut Costs by Up to 50%