Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0What it is
- Visible step-by-step reasoning in
<think>blocks before answering - Terse, here's-the-fix answers (no filler)
- Admits uncertainty on hard or obscure problems rather than hallucinating
- Stable NPC identity (does not claim to be Qwen)
Honest capability framing
This is a 1.5B model. It handles easy-to-medium coding and debugging competently and reasons visibly about them. It is NOT an olympiad-level solver — on genuinely hard algorithmic problems the reasoning can be incomplete, and the model is trained to SAY so rather than emit confident-but-wrong solutions. Treat it as a fast local assistant for everyday coding, not a replacement for a frontier model on hard problems.
It can still be overconfident on obscure factual trivia (exact default arguments, precise version numbers) — the honest-failure training mitigates but does not eliminate this at 1.5B. Verify specifics against the docs.
Benchmark: HumanEval (instruct, pass@1, greedy): 65.9%. Measured with
lm-eval-harness humaneval_instruct. (The personality fine-tune slightly
improved the extractable-code rate vs. the reasoning-only stage, because
terser answers parse more cleanly.)
Personality behavior (held-out eval, 200 prompts)
| behavior | result |
|---|---|
| Correct NPC identity when asked | 100% |
| No identity mention on neutral coding (over-emission) | 2.5% |
| Denies being Qwen / wrong maker | 100% |
| Flags uncertainty on unknown/obscure APIs | 100% |
Training
- Stage 1 — reasoning: SFT on
open-r1/codeforces-cots(decontaminated Python subsets, fit-filtered to ≤8192 tokens so every<think>trace is complete; the filter biases toward shorter, laconic traces). 15k traces. - Stage 2 — voice + identity + honest-failure: SFT with a 7k-example personality set (gated identity, a large anti-over-emission cohort, an honest-failure cohort, and a 1k anti-forgetting buffer of Stage-1 reasoning data). LoRA, gentle LR, both stages merged.
Apache 2.0 model. Reasoning data: open-r1/codeforces-cots (CC-BY-4.0 / ODC-By,
attributed).
Local use
GGUF quants: q4_k_m (~941 MB, laptop default), q5_k_m (~1.1 GB), q8_0
(~1.6 GB), f16 (~3.1 GB). At q4_k_m, ~7 tok/s on CPU. Uses the standard ChatML
(<|im_start|> / <|im_end|>) template.
If q4_k_m's coherence on edge cases matters to you, q5_k_m is a cleaner default.
Attribution & author
Reasoning data: open-r1/codeforces-cots (HuggingFace Open-R1), CC-BY-4.0.
Base model: Qwen/Qwen2.5-Coder-1.5B-Instruct, Apache 2.0.
Author: Rama Krishna Bachu / Bottensor (Independent Research). ORCID 0009-0000-1298-0681.
Model provider
ramankrishna10
Model tree
Base
Qwen/Qwen2.5-Coder-1.5B-Instruct
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information