0xSero/GLM-5-381B API & Inference Endpoint

At a glance

Table

Base model	zai-org/GLM-5
Format	BF16
Total params	381B
Active / token	—
Experts / layer	128
Layers	78
Hidden size	6144
Context	202,752
On-disk size	1147 GB

Which variant should I pick?

Table with columns: Variant, Format, Link
Variant	Format	Link
`GLM-5-381B` (this)	BF16	link
`GLM-5-381B-GGUF-BF16`	GGUF	link
`GLM-5-381B-GGUF-IQ2_M`	GGUF	link

This repository now hosts the BF16 GLM-5 checkpoint produced by a 50% REAP prune. The actual checkpoint contents are the BF16 files described below.

Checkpoint

Base model: GLM-5-BF16
Architecture: GlmMoeDsaForCausalLM
Method: refusal_contrast_reap
Compression ratio: 0.50
Seed: 42
Router renormalization: true
Parameters: 381,464,351,232
Total safetensors size: 762,928,740,864 bytes
Shards: 17

Provenance

Observation run: glm5-grouped-22k-20260331T172330Z
Calibration dataset: combined
Prune output directory: /data0/external_research/glm5-layerwise-reap-artifacts/GLM-5-BF16/combined/pruned_models/layerwise_refusal_contrast_reap-renorm_true-seed_42-0.50

Files

model-00001-of-00017.safetensors through model-00017-of-00017.safetensors
model.safetensors.index.json
config.json
generation_config.json
chat_template.jinja
tokenizer.json
tokenizer_config.json
reap_layerwise_args.yaml

Notes

This upload replaces the older multi-shard checkpoint previously hosted in this repo.
The metadata above reflects the actual checkpoint contents as of 2026-04-05.

License & citation

License inherited from the base model.

bibtex
@misc{lasby2025reap,
  title  = {REAP the Experts: Why Pruning Prevails for One-Shot MoE Compression},
  author = {Mike Lasby and Ivan Lazarevich and Nish Sinnadurai and Sean Lie and Yani Ioannou and Vithursan Thangarasa},
  year   = {2025}, eprint = {2510.13999}, archivePrefix = {arXiv}
}

At a glance

Table

Base model	zai-org/GLM-5
Format	BF16
Total params	381B
Active / token	—
Experts / layer	128
Layers	78
Hidden size	6144
Context	202,752
On-disk size	1147 GB

Which variant should I pick?

Table with columns: Variant, Format, Link
Variant	Format	Link
`GLM-5-381B` (this)	BF16	link
`GLM-5-381B-GGUF-BF16`	GGUF	link
`GLM-5-381B-GGUF-IQ2_M`	GGUF	link

This repository now hosts the BF16 GLM-5 checkpoint produced by a 50% REAP prune. The actual checkpoint contents are the BF16 files described below.

Checkpoint

Base model: GLM-5-BF16
Architecture: GlmMoeDsaForCausalLM
Method: refusal_contrast_reap
Compression ratio: 0.50
Seed: 42
Router renormalization: true
Parameters: 381,464,351,232
Total safetensors size: 762,928,740,864 bytes
Shards: 17

Provenance

Observation run: glm5-grouped-22k-20260331T172330Z
Calibration dataset: combined
Prune output directory: /data0/external_research/glm5-layerwise-reap-artifacts/GLM-5-BF16/combined/pruned_models/layerwise_refusal_contrast_reap-renorm_true-seed_42-0.50

Files

model-00001-of-00017.safetensors through model-00017-of-00017.safetensors
model.safetensors.index.json
config.json
generation_config.json
chat_template.jinja
tokenizer.json
tokenizer_config.json
reap_layerwise_args.yaml

Notes

This upload replaces the older multi-shard checkpoint previously hosted in this repo.
The metadata above reflects the actual checkpoint contents as of 2026-04-05.

License & citation

License inherited from the base model.

bibtex
@misc{lasby2025reap,
  title  = {REAP the Experts: Why Pruning Prevails for One-Shot MoE Compression},
  author = {Mike Lasby and Ivan Lazarevich and Nish Sinnadurai and Sean Lie and Yani Ioannou and Vithursan Thangarasa},
  year   = {2025}, eprint = {2510.13999}, archivePrefix = {arXiv}
}

GLM-5-381B

Get help setting up a custom Dedicated Endpoints.

README

At a glance

Which variant should I pick?

Checkpoint

Provenance

Files

Notes

License & citation

Sponsors

Explore FriendliAI today

README

At a glance

Which variant should I pick?

Checkpoint

Provenance

Files

Notes

License & citation

Sponsors