helixdouble
GLM-5.1-Abliterated-1.35-Unverified
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Critical Disclaimer
This is an unverified experimental checkpoint uploaded for storage and hand-off purposes.
Do not treat this as a validated release. This checkpoint has not completed the planned refusal, coherence, capability, or safety evaluation. It has also not gone through the planned healing pass.
This model was intentionally modified to reduce refusal behavior. It may produce harmful, unsafe, illegal, offensive, low-quality, or otherwise undesirable outputs. It should not be deployed as a safety-filtered assistant, used in production, or presented as aligned.
Use only for controlled research and evaluation.
What This Is
This checkpoint is based on zai-org/GLM-5.1-FP8 and was modified with direct FP8 weight surgery on self_attn.o_proj weights.
Surgery parameters:
- Method:
direct_weight_surgery_fp8_block_quant max_weight:1.35min_weight:0.5max_weight_position:51.48min_weight_distance:39.0- Modified layers:
13..77 - Directions file: project refusal directions tensor with shape
(79, 6144)
The edit projects out the refusal direction from affected output-projection weights and requantizes the modified shards back to FP8 block-scaled safetensors.
Validation Status
Validation status: not validated.
Known missing checks:
- No completed harmful refusal-rate eval for this checkpoint.
- No completed harmless coherence eval for this checkpoint.
- No KL/capability regression measurement for this checkpoint.
- No healing LoRA pass has been applied.
- No post-upload serving smoke test has been completed.
The checkpoint may be over-abliterated. It may also be under-abliterated. The current upload should be treated as a raw artifact, not a selected final model.
Intended Next Steps
- Run refusal and coherence evaluation.
- Inspect harmful and harmless sample outputs manually.
- Run the planned healing pass.
- Re-evaluate after healing.
- Replace or supersede this artifact with a validated release if results are acceptable.
Runtime Notes
This is a very large FP8 MoE checkpoint. For vLLM on Blackwell/B200, the tested no-LoRA path uses the FlashInfer TRTLLM FP8 MoE backend. LoRA-based evaluation on B200 FP8 MoE was avoided because it triggered fused-MoE LoRA kernel failures in earlier testing.
Provenance
Generated on June 14, 2026 from the local glm-abliteration-project direct surgery path:
bash
python3 scripts/direct_weight_abliterate.py \--src /workspace/glm5-fp8 \--dst /workspace/glm5-fp8-ablit_t0_mw1.35 \--directions /workspace/output/refusal_directions.pt \--max-weight 1.35 \--min-weight 0.5 \--max-weight-position-frac 0.66 \--min-weight-distance-frac 0.5 \--use-hardlinks
The generated abliteration_meta.json is included in the repository.
Model provider
helixdouble
Model tree
Base
zai-org/GLM-5.1-FP8
Quantized
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information