helixdouble

GLM-5.1-Abliterated-1.35-Unverified

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

Critical Disclaimer

This is an unverified experimental checkpoint uploaded for storage and hand-off purposes.

Do not treat this as a validated release. This checkpoint has not completed the planned refusal, coherence, capability, or safety evaluation. It has also not gone through the planned healing pass.

This model was intentionally modified to reduce refusal behavior. It may produce harmful, unsafe, illegal, offensive, low-quality, or otherwise undesirable outputs. It should not be deployed as a safety-filtered assistant, used in production, or presented as aligned.

Use only for controlled research and evaluation.

What This Is

This checkpoint is based on zai-org/GLM-5.1-FP8 and was modified with direct FP8 weight surgery on self_attn.o_proj weights.

Surgery parameters:

  • Method: direct_weight_surgery_fp8_block_quant
  • max_weight: 1.35
  • min_weight: 0.5
  • max_weight_position: 51.48
  • min_weight_distance: 39.0
  • Modified layers: 13..77
  • Directions file: project refusal directions tensor with shape (79, 6144)

The edit projects out the refusal direction from affected output-projection weights and requantizes the modified shards back to FP8 block-scaled safetensors.

Validation Status

Validation status: not validated.

Known missing checks:

  • No completed harmful refusal-rate eval for this checkpoint.
  • No completed harmless coherence eval for this checkpoint.
  • No KL/capability regression measurement for this checkpoint.
  • No healing LoRA pass has been applied.
  • No post-upload serving smoke test has been completed.

The checkpoint may be over-abliterated. It may also be under-abliterated. The current upload should be treated as a raw artifact, not a selected final model.

Intended Next Steps

  1. Run refusal and coherence evaluation.
  2. Inspect harmful and harmless sample outputs manually.
  3. Run the planned healing pass.
  4. Re-evaluate after healing.
  5. Replace or supersede this artifact with a validated release if results are acceptable.

Runtime Notes

This is a very large FP8 MoE checkpoint. For vLLM on Blackwell/B200, the tested no-LoRA path uses the FlashInfer TRTLLM FP8 MoE backend. LoRA-based evaluation on B200 FP8 MoE was avoided because it triggered fused-MoE LoRA kernel failures in earlier testing.

Provenance

Generated on June 14, 2026 from the local glm-abliteration-project direct surgery path:

bash

python3 scripts/direct_weight_abliterate.py \
--src /workspace/glm5-fp8 \
--dst /workspace/glm5-fp8-ablit_t0_mw1.35 \
--directions /workspace/output/refusal_directions.pt \
--max-weight 1.35 \
--min-weight 0.5 \
--max-weight-position-frac 0.66 \
--min-weight-distance-frac 0.5 \
--use-hardlinks

The generated abliteration_meta.json is included in the repository.

Model provider

helixdouble

Model tree

Base

zai-org/GLM-5.1-FP8

Quantized

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today