kennethp97
sft-arm-a-1p5b
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0What it does
Given a procedure and a scenario, the model emits an EDGE CHECKS: reasoning
block followed by a FINAL ANSWER: compliant|non-compliant line. The recipe
targets the free-form regime; gains concentrate there.
Headline eval (frozen 233-process held-out; 128 flip + 122 anchor; greedy / T=0)
| regime | flip rate (base -> SFT) | anchor acc (base -> SFT) | plain (base -> SFT) |
|---|---|---|---|
| forced | 0.117 -> 0.188 | 0.557 -> 0.582 | 0.570 -> 0.608 |
| free-form | 0.219 -> 0.469 (+25.0pp) | 0.467 -> 0.664 (+19.7pp) | 0.576 -> 0.726 |
The lift is free-form-only (the regime the reasoning recipe targets); the gains concentrate on exception / hierarchy / threshold handles, while step-ordering stays flat (0.200 -> 0.225) -- the known structural bottleneck.
How to use
python
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModelBASE = "Qwen/Qwen2.5-1.5B-Instruct"ADAPTER = "kennethp97/sft-arm-a-1p5b"tok = AutoTokenizer.from_pretrained(BASE, use_fast=True)tok.pad_token = tok.pad_token or tok.eos_tokenbase = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16,device_map="auto")model = PeftModel.from_pretrained(base, ADAPTER)model.eval()
Prompt format and a worked side-by-side eval against the base and the companion
DPO adapter (kennethp97/dpo-flip-1p5b) are in the combined eval notebook.
Training summary
- Base:
Qwen/Qwen2.5-1.5B-Instruct - LoRA r=32 alpha=64 on q/k/v/o/gate/up/down
- Plain SFT (cross-entropy on the chosen completion), full bf16
- Training set: 3,734 rows (after filtering 1,226 placeholder-
verifier_reasonrows from the 5,020-row v0.4.0 corpus)
Limitations
- Research checkpoint, not a production classifier. Below the pre-registered absolute GO bar.
- Step-ordering bottleneck. Ordering flip stays nearly flat.
- Free-form is the target regime. Forced-verdict gains are small.
- Format sensitivity. Trained on the
EDGE CHECKS ... FINAL ANSWERformat above; deviation may degrade performance. Greedy (T=0) matches the reported numbers.
License
Adapter: Apache-2.0. Base model: under the Qwen2.5-1.5B-Instruct license.
Model provider
kennethp97
Model tree
Base
Qwen/Qwen2.5-1.5B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information