dominant-strategies

Qwen3.6-27B-heretic-pearl

README

License: apache-2.0

What this is based on

Original base (architecture): Qwen/Qwen3.6-27B
- a hybrid (linear-attention + full-attention) vision-language model (model_type: qwen3_5, image-text-to-text, with the 15 MTP heads intact).
Direct source (the weights we quantized): llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved
- a heretic decensored / abliterated build of Qwen3.6-27B (made with Heretic v1.3.0 and a variant of the Magnitude-Preserving Orthogonal Ablation method; reported 94% fewer refusals, 6/100 vs 92/100, at ~0.002 KL divergence vs the original), with the native MTP heads preserved. All credit for the base + decensoring work goes to llmfan46; this repo only re-quantizes their weights.

What we did to make it Pearl-quantized

The model is quantized with quant_method: "pearl", format: "int-quantized". The scheme is int7 weights on every matmul layer (so each is a valid Pearl mining GEMM), with a small set of layers deliberately left in higher precision for quality, plus an int8 token embedding to claw back VRAM for context.

int7 (7-bit), per-channel, symmetric, with dynamic int7 input activations - applied to:

Table with columns: Layer, Regex
Layer	Regex
Attention projections	`self_attn.{q,k,v,o}_proj`
MLP projections	`mlp.{gate,up,down}_proj`
Linear-attention projections	`linear_attn.{in_proj_qkv,in_proj_z,out_proj}`
LM head	`lm_head`

Left in bf16 (not quantized) - all *norm*, the entire vision tower (*.visual.*), and the linear-attention internals that are numerically sensitive (in_proj_a, in_proj_b, conv1d, A_log, dt_bias). The token embedding is int8, not bf16 - see the e8 note below.

The e8 part - int8 token embedding. The 248,320-row embed_tokens table is stored as int8 (I8 weights with per-row bf16 dequant scales) - excluded from the int7 group and handled by the plugin's embedding patch. On a 32 GiB card this frees ~1.2 GiB, which we spend on a much larger KV-cache / context window at no measurable quality or throughput cost.

Net effect: the heavy GEMMs (attention, MLP, linear-attn, LM head) are int7 - ~7× smaller than bf16 and in Pearl's Int7xInt7→Int32 mining shape - while norms, the vision encoder, and the mamba/linear-attention internals keep full precision, and the embedding is int8.

Why int7 (the merge-mining point)

Pearl's Proof-of-Useful-Work is a noisy int7 × int7 → int32 GEMM whose folded transcript is hashed against the difficulty target. By quantizing the model's matmuls to that exact format, the Pearl GPU kernel (pearl_gemm) computes the clean inference output and a mining share from the same matmul. A prefill burst therefore both answers the prompt and submits real, consensus-valid pool shares - "useful work" in the literal sense.

How to run it

This model requires the Pearl vLLM plugin (quant_method: pearl) and the pearl_gemm CUDA kernels; stock vLLM/transformers will not interpret the pearl quantization. Target GPU: sm_120 (RTX 5090 / 5080); the build is the multimodal (VL) variant.

python
# with the Pearl plugin installed (vllm_miner + pearl_gemm):
from vllm import LLM
llm = LLM(
    model="dominant-strategies/Qwen3.6-27B-heretic-pearl",
    quantization="pearl",
    dtype="bfloat16",
    trust_remote_code=True,
)

The turnkey merge-mining miner pulls this repo, serves it with the lazy-load controller (model sleeps when idle so the GPU mines at full rate, wakes ~1.2 s on a chat, prefill merge-mines), and submits shares to a Pearl pool.

Files

Standard transformers layout: config.json (with the pearl quantization_config), 12 sharded *.safetensors + model-auxiliary.safetensors, model.safetensors.index.json, tokenizer, chat template, and the vision/video preprocessor configs.

License & attribution

Apache-2.0, following the Qwen3.6-27B license. Base architecture by Qwen; decensoring by llmfan46; Pearl quantization and packaging by dominant-strategies. This is a derivative redistributed for use on the Pearl network.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

dominant-strategies

Model Tree

Base

llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved

Quantized

this model

Input Modalities

Text

Image

Video

Output Modalities