empero-ai

Qwable-9B-Claude-Fable-5

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model details

  • Developed by: Empero
  • Base model: Qwen3.5-9B — a dense, natively multimodal model with a hybrid attention stack (3:1 Gated DeltaNet linear-attention to Gated full-attention), ~152k vocabulary, long native context.
  • Fine-tune type: full parameter (all text-backbone weights trained). The vision tower was frozen — training was text-only, so vision behavior is inherited from the base and was not tuned or tested.
  • Objective: supervised fine-tuning, assistant-only loss (the model is scored only on the assistant/completion tokens; prompts are masked out).
  • Languages: primarily English.
  • License: apache-2.0, inherited from the base weights — but see the data-provenance caveat below.

Training data

Table
SourceRoleApprox. examples (after holdout)
Glint-Research/Fable-5-tracesClaude Fable 5 reasoning + coding traces (contextcompletion)~4,585
Roman1111111/gpt5.5-terminalGPT-5.5 terminal/agent task solutions (system + promptsolution)~111

Both sources were normalized to a single chat format (user/assistant, with an optional system turn for the terminal tasks) and concatenated. The natural mix is heavily skewed toward Fable traces (~97%); no re-weighting was applied to the training set.

Held-out eval split: 100 examples were withheld from training — deliberately composed 80% Fable / 20% terminal so the held-out loss carries signal on both task types rather than being dominated by Fable.

Training procedure

Full-parameter supervised fine-tuning with TRL, using:

  • Full-length traces, zero truncation (max_length = 76,800) — even the longest multi-turn traces (~74k tokens) are trained in full.
  • Assistant-only loss — the model is scored only on assistant/completion tokens; prompt tokens are masked.
  • Chunked cross-entropy for memory-efficient long-context training.
Table
HyperparameterValue
Epochs2
Effective batch size16
Max sequence length76,800 (no truncation)
Learning rate1e-5 (cosine, 3% warmup)
OptimizerAdamW (8-bit)
Precisionbf16
Losschunked NLL, assistant-only

Evaluation

Training quality was tracked via held-out validation loss and token-accuracy on a 100-example split and supplemented with a qualitative generation review (below). A full suite of coding, agentic, and safety benchmarks is in progress and will be published here. Validation was run periodically during training:

Table
Stepeval losseval token-acc
1000.7430.784
2000.7220.789
300 (≈ epoch 1)0.7140.791
4000.71350.791
5000.7130.791

No overfitting observed. Held-out loss decreased monotonically and then plateaued (~0.71) through the second epoch — it never rose, even as train loss fell to ~0.64. Epoch-1 and final (epoch-2) checkpoints generalize equivalently on held-out data.

Note: token-accuracy is teacher-forced, per-token next-token accuracy over completion tokens only. It is not end-to-end correctness and tends to read high on consistent-style distillation data.

Qualitative generation review

34 prompts spanning coding, terminal/agentic tasks, reasoning, explanation, instruction-following, and honesty/calibration probes were run against the final checkpoint using Qwen3.5's recommended sampling settings. Full unedited transcripts are in sample_generations.md.

Strengths. Coding and terminal/agentic prompts were the strongest — correct, idiomatic solutions using current tooling (e.g. ss over netstat, git-filter-repo, Argon2id) with security-aware judgment (rotating a leaked key first, constant-time comparison, generic auth errors). Reasoning, instruction/format following, and calibration probes were handled well. Roughly 27 of 34 responses were clean and correct.

The model is a reasoning model: every answer begins with a <think> block followed by the final response — downstream consumers should parse out and strip the <think>...</think> span. See Limitations for usage tips.

How to use

The base is a multimodal (image-text-to-text) architecture; for text-only use load it with AutoModelForImageTextToText. Build the prompt with tokenize=False and then tokenize the string (the recommended path for this tokenizer):

python

import torch
from transformers import AutoModelForImageTextToText, AutoTokenizer
model_id = "empero-ai/Qwable-9B-Claude-Fable-5"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
model_id, dtype="bfloat16", device_map="auto"
)
messages = [{"role": "user", "content": "Write a Python function that merges two sorted lists."}]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)
out = model.generate(
**inputs, max_new_tokens=2048, do_sample=True,
temperature=0.7, top_p=0.95, top_k=20, repetition_penalty=1.05,
)
# Output begins with a <think>...</think> reasoning block, then the final answer.
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

repetition_penalty=1.05 is a small deviation from Qwen's default (1.0) that prevents rare non-terminating reasoning loops; allow generous max_new_tokens since the model reasons before answering.

Requirements: a recent transformers (Qwen3.5 support) plus the Gated DeltaNet kernels (flash-linear-attention and a CUDA-matched causal_conv1d build) — without them the linear-attention layers fall back to slow, memory-hungry PyTorch ops.

Limitations

Qwable-9B-Claude-Fable-5 is a focused 9B model that shines on the coding, agentic, and reasoning tasks it was trained for. A few characteristics are worth knowing to get the best out of it:

  • It's a reasoning model. Each response opens with a <think> block before the final answer, so parse and strip the <think>...</think> span for end users. On open-ended or creative prompts it may reason at length — allow generous max_new_tokens and use repetition_penalty≈1.05 (as in the snippet above) for consistently crisp completions.
  • Strongest within its domain. Capability is concentrated in coding and agentic/tool-use tasks. For general-knowledge or long-form factual questions, treat specifics as you would any 9B model's — verify before relying on them, and don't expect knowledge of events outside the base model's training.
  • Reflects its base and teachers. As a distillation fine-tune of Qwen3.5-9B on Claude Fable 5 and GPT-5.5 traces, it carries the style and limits of those sources and received no extra safety tuning beyond the base model's. Add your own review/safety layer for production use.
  • Text-only fine-tune. The base is multimodal, but only the text path was trained (vision left untouched and not evaluated here).

These are normal considerations for a compact, domain-focused model rather than blockers — used within its wheelhouse with the sampling settings above, it's a capable and dependable coding/agentic assistant.

Provenance & licensing

The model weights are released under Apache-2.0, inherited from the Qwen3.5-9B base. The fine-tuning data comes from generated traces of Claude Fable 5 and GPT-5.5 (via the linked public datasets). Because those traces originate from third-party assistants, the providers' terms may apply to downstream training and distillation — so if you plan to build on this model commercially, it's worth confirming your use aligns with those terms. Shared with the community for research and experimentation, as-is.

Support / Donate

If this model helped you, consider supporting the project:

  • BTC: bc1qx6zepu6sfkvshgdmc4ewu6pk6rpadvpgffpp7v
  • LTC: ltc1qv2mefzps2vtjcpwfx8xxdrpplrcvltswm68r7x
  • XMR: 42Dbm5xg5Nq26fdyzfEU7KBnAJfhi7Cvz5J2ex5CzHXkfKuNEJzYCcmJ1GTbgjFZ5MBx72sdG1G9239Cd6rsZfv4QeDkYJY

Acknowledgements

Model provider

empero-ai

Model tree

Base

Qwen/Qwen3.5-9B

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today