Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Evaluation (post-fix, 3-judge panel)

Mean score (0–100) on 15 held-out prompts, graded by Claude Opus 4.7, GPT-5.5, and a local Qwen-3B (gpt-oss experts is a deliberately un-retrained stale control):

modelClaudeGPT-5.5Qwen-3BAvg
gpt-5.5 (frontier ceiling)94.695.690.893.7
gpt-oss attn (retrained teacher)82.066.881.476.7
qwen-0.5b distilled (served)79.068.682.276.6
qwen-0.5b direct 7k (served)78.664.482.075.0
gpt-oss experts (stale control)67.668.681.872.7
qwen-3b base62.167.180.569.9
gpt-oss base55.453.868.259.1
qwen-0.5b base36.544.567.949.7

Both served retrained 0.5Bs beat the stale control and every untuned base across all three judges, and the distilled 0.5B ≈ ties its own 20B teacher.

Limitations

  • 0.5B capacity; prompt-format-frozen (see below). A purpose-built ProofKit component.

About ProofKit

ProofKit is a work-sample generator for job seekers — it turns a target role, background, and skills-to-prove into a realistic, clearly-fictional practice work sample (a role-specific challenge, a guided builder, a readiness review, and a recruiter-ready portfolio packet). Built for the Hugging Face Build Small Hackathon (Backyard AI track). Integrity rules are load-bearing: outputs never claim real employment, metrics are labeled hypothetical, and exports carry an ethical disclosure.

The ProofKit model family

RepoWhat it is
visproj/proofkit-qwen0.5b-7kQwen2.5-0.5B fine-tuned directly on the 7k set (Transformers)
visproj/proofkit-gpt-oss-20b-loragpt-oss-20b LoRA — the distillation teacher
visproj/proofkit-distilled-qwen0.5bQwen2.5-0.5B distilled from the teacher (merged)
visproj/proofkit-distilled-qwen0.5b-ggufGGUF of the distilled student (llama.cpp — served)
visproj/proofkit-sftSFT dataset (synthetic, license-safe)
visproj/proofkit-distill-qwen0.5bDistillation dataset (teacher completions)

A note on training data (the "static responses" fix)

An earlier version of these models produced repetitive, input-ignoring drafts. The root cause was synthetic-data leakage: the dataset rendered the example user answers and the target from the same template slots, so the model learned target = template instead of target = f(input). The fix — faithfulness anchors (a distinctive token shared by the answer and the target) + seeded per-example variation across every task, then a full-chain retrain — is what these current weights reflect.

Prompt format is a frozen contract

These 0.5B models were trained on the exact prompt shapes from ProofKit's prompt_formats.py. They only behave well when prompted in that format; reworded or free-form prompts push them off-distribution. They are purpose-built components of the ProofKit app, not general chat models.

Model provider

build-small-hackathon

Model tree

Base

Qwen/Qwen2.5-0.5B-Instruct

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today