groxaxo
Code-Writer-V2-Obliterated-BF16
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0The pitch, in one breath
A vision-capable, long-context (up to 200,000 tokens), free writer-and-coder in its purest, full-precision form. It writes prose that breathes and code that compiles — and here it does both with every bit intact.
That is the whole idea. Everything below is just how we kept the promise.
What it is
Code Writer V2 — Obliterated (BF16) is the merged, full-precision result of
Qwen3.5-27B-Writer-V2-uncensored-heretic joined with a purpose-trained
coding LoRA (coding_mix_8k, checkpoint-25, rank-16 / alpha-32) and saved
in BF16 — no quantization, no compromise.
- Architecture: Qwen3.5 (
qwen3_5) — a hybrid mind. 64 decoder layers, of which only 16 carry full attention while the rest run GDN linear attention. This is the secret of its long memory. - Modalities: a full vision tower rides along (served text-only by default; vision is wired but untested — light the candle at your own pleasure).
- Character: heretic by lineage and free by intent — it does not flinch, and it does not lecture. It simply does the work.
Which one do I want?
| This — BF16 | FP8 | |
|---|---|---|
| Fidelity | Reference master, full precision | Faithful, ~half the footprint |
| Footprint | ~12 shards, BF16 | FP8 weights, fits 2 consumer GPUs |
| Use it for | golden reference, further quantization, max quality | day-to-day serving on vLLM |
If you plan to serve it now, take the FP8. If you want the untouched source of truth — or a base for your own quants — you're in the right place.
Sampling (official Qwen3.5-27B recommendations)
| Mode | temp | top_p | notes |
|---|---|---|---|
| instruct | 1.0 | 0.95 | top_k 20, min_p 0 |
| general | 0.7 | 0.80 | top_k 20, min_p 0 |
| coding | 0.6 | 0.95 | thinking on |
| thinking | 1.0 | 0.95 | thinking on |
| roleplay | 1.0 | 0.95 | top_k 20, min_p 0 |
Note: this is a pure decoder (layers 0–63) — no MTP head, no native tool-calling.
num_key_value_heads = 4, so tensor-parallel must be 2 or 4 (never 3).
What it's for
- Writing — fiction, screenplay, copy, the long dark prose of the soul.
- Code — the LoRA was trained for it; the temperament was kept for it.
- Long work — 200k tokens means whole codebases, whole manuscripts, whole conversations held in a single thought.
What to know before you sail
- It is free. Freedom is a tool; you are the hand that holds it. You own what you make with it.
- Vision is present but unproven here — validate an image path before you trust it in production.
Provenance
- Base:
llmfan46/Qwen3.5-27B-Writer-V2-uncensored-heretic(BF16) - LoRA:
coding_mix_8kcheckpoint-25 (r16, α32), merged to BF16 - Precision: BF16, unquantized
- Built: 2026-06-22
Real artists ship. So we shipped a poet that codes.
Now go make something.
Model provider
groxaxo
Model tree
Base
llmfan46/Qwen3.5-27B-Writer-V2-uncensored-heretic
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information