kennethp97

sft-arm-a-1p5b

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

What it does

Given a procedure and a scenario, the model emits an EDGE CHECKS: reasoning block followed by a FINAL ANSWER: compliant|non-compliant line. The recipe targets the free-form regime; gains concentrate there.

Headline eval (frozen 233-process held-out; 128 flip + 122 anchor; greedy / T=0)

Table
regimeflip rate (base -> SFT)anchor acc (base -> SFT)plain (base -> SFT)
forced0.117 -> 0.1880.557 -> 0.5820.570 -> 0.608
free-form0.219 -> 0.469 (+25.0pp)0.467 -> 0.664 (+19.7pp)0.576 -> 0.726

The lift is free-form-only (the regime the reasoning recipe targets); the gains concentrate on exception / hierarchy / threshold handles, while step-ordering stays flat (0.200 -> 0.225) -- the known structural bottleneck.

How to use

python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE = "Qwen/Qwen2.5-1.5B-Instruct"
ADAPTER = "kennethp97/sft-arm-a-1p5b"
tok = AutoTokenizer.from_pretrained(BASE, use_fast=True)
tok.pad_token = tok.pad_token or tok.eos_token
base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16,
device_map="auto")
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()

Prompt format and a worked side-by-side eval against the base and the companion DPO adapter (kennethp97/dpo-flip-1p5b) are in the combined eval notebook.

Training summary

  • Base: Qwen/Qwen2.5-1.5B-Instruct
  • LoRA r=32 alpha=64 on q/k/v/o/gate/up/down
  • Plain SFT (cross-entropy on the chosen completion), full bf16
  • Training set: 3,734 rows (after filtering 1,226 placeholder-verifier_reason rows from the 5,020-row v0.4.0 corpus)

Limitations

  • Research checkpoint, not a production classifier. Below the pre-registered absolute GO bar.
  • Step-ordering bottleneck. Ordering flip stays nearly flat.
  • Free-form is the target regime. Forced-verdict gains are small.
  • Format sensitivity. Trained on the EDGE CHECKS ... FINAL ANSWER format above; deviation may degrade performance. Greedy (T=0) matches the reported numbers.

License

Adapter: Apache-2.0. Base model: under the Qwen2.5-1.5B-Instruct license.

Model provider

kennethp97

Model tree

Base

Qwen/Qwen2.5-1.5B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today