Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

What it is

  • Visible step-by-step reasoning in <think> blocks before answering
  • Terse, here's-the-fix answers (no filler)
  • Admits uncertainty on hard or obscure problems rather than hallucinating
  • Stable NPC identity (does not claim to be Qwen)

Honest capability framing

This is a 1.5B model. It handles easy-to-medium coding and debugging competently and reasons visibly about them. It is NOT an olympiad-level solver — on genuinely hard algorithmic problems the reasoning can be incomplete, and the model is trained to SAY so rather than emit confident-but-wrong solutions. Treat it as a fast local assistant for everyday coding, not a replacement for a frontier model on hard problems.

It can still be overconfident on obscure factual trivia (exact default arguments, precise version numbers) — the honest-failure training mitigates but does not eliminate this at 1.5B. Verify specifics against the docs.

Benchmark: HumanEval (instruct, pass@1, greedy): 65.9%. Measured with lm-eval-harness humaneval_instruct. (The personality fine-tune slightly improved the extractable-code rate vs. the reasoning-only stage, because terser answers parse more cleanly.)

Personality behavior (held-out eval, 200 prompts)

behaviorresult
Correct NPC identity when asked100%
No identity mention on neutral coding (over-emission)2.5%
Denies being Qwen / wrong maker100%
Flags uncertainty on unknown/obscure APIs100%

Training

  • Stage 1 — reasoning: SFT on open-r1/codeforces-cots (decontaminated Python subsets, fit-filtered to ≤8192 tokens so every <think> trace is complete; the filter biases toward shorter, laconic traces). 15k traces.
  • Stage 2 — voice + identity + honest-failure: SFT with a 7k-example personality set (gated identity, a large anti-over-emission cohort, an honest-failure cohort, and a 1k anti-forgetting buffer of Stage-1 reasoning data). LoRA, gentle LR, both stages merged.

Apache 2.0 model. Reasoning data: open-r1/codeforces-cots (CC-BY-4.0 / ODC-By, attributed).

Local use

GGUF quants: q4_k_m (~941 MB, laptop default), q5_k_m (~1.1 GB), q8_0 (~1.6 GB), f16 (~3.1 GB). At q4_k_m, ~7 tok/s on CPU. Uses the standard ChatML (<|im_start|> / <|im_end|>) template.

If q4_k_m's coherence on edge cases matters to you, q5_k_m is a cleaner default.

Attribution & author

Reasoning data: open-r1/codeforces-cots (HuggingFace Open-R1), CC-BY-4.0. Base model: Qwen/Qwen2.5-Coder-1.5B-Instruct, Apache 2.0. Author: Rama Krishna Bachu / Bottensor (Independent Research). ORCID 0009-0000-1298-0681.

Model provider

ramankrishna10

Model tree

Base

Qwen/Qwen2.5-Coder-1.5B-Instruct

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today