At a glance
Table | |
|---|
| Base model | zai-org/GLM-5 |
| Format | BF16 |
| Total params | 381B |
| Active / token | — |
| Experts / layer | 128 |
| Layers | 78 |
| Hidden size | 6144 |
| Context | 202,752 |
| On-disk size | 1147 GB |
Which variant should I pick?
Table with columns: Variant, Format, Link| Variant | Format | Link |
|---|
GLM-5-381B (this) | BF16 | link |
GLM-5-381B-GGUF-BF16 | GGUF | link |
GLM-5-381B-GGUF-IQ2_M | GGUF | link |
This repository now hosts the BF16 GLM-5 checkpoint produced by a 50% REAP prune.
The actual checkpoint contents are the BF16 files described below.
Checkpoint
- Base model:
GLM-5-BF16
- Architecture:
GlmMoeDsaForCausalLM
- Method:
refusal_contrast_reap
- Compression ratio:
0.50
- Seed:
42
- Router renormalization:
true
- Parameters:
381,464,351,232
- Total safetensors size:
762,928,740,864 bytes
- Shards:
17
Provenance
- Observation run:
glm5-grouped-22k-20260331T172330Z
- Calibration dataset:
combined
- Prune output directory:
/data0/external_research/glm5-layerwise-reap-artifacts/GLM-5-BF16/combined/pruned_models/layerwise_refusal_contrast_reap-renorm_true-seed_42-0.50
Files
model-00001-of-00017.safetensors through model-00017-of-00017.safetensors
model.safetensors.index.json
config.json
generation_config.json
chat_template.jinja
tokenizer.json
tokenizer_config.json
reap_layerwise_args.yaml
Notes
- This upload replaces the older multi-shard checkpoint previously hosted in this repo.
- The metadata above reflects the actual checkpoint contents as of
2026-04-05.
License & citation
License inherited from the base model.
@misc{lasby2025reap,
title = {REAP the Experts: Why Pruning Prevails for One-Shot MoE Compression},
author = {Mike Lasby and Ivan Lazarevich and Nish Sinnadurai and Sean Lie and Yani Ioannou and Vithursan Thangarasa},
year = {2025}, eprint = {2510.13999}, archivePrefix = {arXiv}
}
Made possible by NVIDIA · TNG Technology · Lambda · Prime Intellect · Hot Aisle.