groxaxo

Code-Writer-V2-Obliterated-BF16

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

The pitch, in one breath

A vision-capable, long-context (up to 200,000 tokens), free writer-and-coder in its purest, full-precision form. It writes prose that breathes and code that compiles — and here it does both with every bit intact.

That is the whole idea. Everything below is just how we kept the promise.


What it is

Code Writer V2 — Obliterated (BF16) is the merged, full-precision result of Qwen3.5-27B-Writer-V2-uncensored-heretic joined with a purpose-trained coding LoRA (coding_mix_8k, checkpoint-25, rank-16 / alpha-32) and saved in BF16 — no quantization, no compromise.

  • Architecture: Qwen3.5 (qwen3_5) — a hybrid mind. 64 decoder layers, of which only 16 carry full attention while the rest run GDN linear attention. This is the secret of its long memory.
  • Modalities: a full vision tower rides along (served text-only by default; vision is wired but untested — light the candle at your own pleasure).
  • Character: heretic by lineage and free by intent — it does not flinch, and it does not lecture. It simply does the work.

Which one do I want?

Table
This — BF16FP8
FidelityReference master, full precisionFaithful, ~half the footprint
Footprint~12 shards, BF16FP8 weights, fits 2 consumer GPUs
Use it forgolden reference, further quantization, max qualityday-to-day serving on vLLM

If you plan to serve it now, take the FP8. If you want the untouched source of truth — or a base for your own quants — you're in the right place.


Sampling (official Qwen3.5-27B recommendations)

Table
Modetemptop_pnotes
instruct1.00.95top_k 20, min_p 0
general0.70.80top_k 20, min_p 0
coding0.60.95thinking on
thinking1.00.95thinking on
roleplay1.00.95top_k 20, min_p 0

Note: this is a pure decoder (layers 0–63) — no MTP head, no native tool-calling. num_key_value_heads = 4, so tensor-parallel must be 2 or 4 (never 3).


What it's for

  • Writing — fiction, screenplay, copy, the long dark prose of the soul.
  • Code — the LoRA was trained for it; the temperament was kept for it.
  • Long work — 200k tokens means whole codebases, whole manuscripts, whole conversations held in a single thought.

What to know before you sail

  • It is free. Freedom is a tool; you are the hand that holds it. You own what you make with it.
  • Vision is present but unproven here — validate an image path before you trust it in production.

Provenance

  • Base: llmfan46/Qwen3.5-27B-Writer-V2-uncensored-heretic (BF16)
  • LoRA: coding_mix_8k checkpoint-25 (r16, α32), merged to BF16
  • Precision: BF16, unquantized
  • Built: 2026-06-22

Real artists ship. So we shipped a poet that codes.

Now go make something.

Model provider

groxaxo

Model tree

Base

llmfan46/Qwen3.5-27B-Writer-V2-uncensored-heretic

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today