Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Results
| Stage | Metric | Score |
|---|---|---|
| Base model (no fine-tune) | LLM-judge avg (1-5), 10 prompts | 2.0 |
| This adapter (sft_trial_1) | LLM-judge avg (1-5), 10 prompts | 3.4 |
| Best GRPO trial on top of this | regex exact-match (0-10) | 2.0 |
Judge model: groq/llama-3.3-70b-versatile. Evaluation prompts are 10 held-out GSM8K
test problems formatted with the ChatML template and the #### N
terminator.
How to use
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerbase_id = "Qwen/Qwen2.5-0.5B-Instruct"adapter_id = "tayyib-sayyid/qwen2.5-0.5b-gsm8k-lora"tok = AutoTokenizer.from_pretrained(adapter_id)base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto")model = PeftModel.from_pretrained(base, adapter_id)model.eval()prompt = ("Natalia sold clips to 48 of her friends in April, and then she sold ""half as many clips in May. How many clips did Natalia sell altogether ""in April and May?")messages = [{"role": "user", "content": prompt}]inputs = tok.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)out = model.generate(inputs, max_new_tokens=256, do_sample=False)print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
The model is trained to end every answer with #### <number> so a simple
regex (r"####\s*(-?[\d,\.]+)") is enough to extract the final
numeric answer.
Training details
- Base model:
Qwen/Qwen2.5-0.5B-Instruct - Dataset:
openai/gsm8k(main config), 5,000-row training subset (seed 42), ChatML rendered with the#### Nfinal-answer terminator. - Method: LoRA (PEFT) on top of 4-bit NF4 base via bitsandbytes.
- Framework versions: PEFT 0.13.x, TRL
SFTTrainer, transformers 4.4x.
LoRA hyperparameters
| Field | Value |
|---|---|
r | 8 |
alpha | 16 |
dropout | 0.05 |
target_modules | q_proj, v_proj |
task_type | CAUSAL_LM |
Optimization
| Field | Value |
|---|---|
| learning rate | 0.0002 |
| optimizer | paged_adamw_8bit |
| per-device batch size | 4 |
| gradient accumulation | 4 |
| effective batch size | 16 |
| epochs | 1 |
| warmup ratio | 0.03 |
| weight decay | 0.0 |
| max sequence length | 1024 |
Limitations
- Trained and evaluated on GSM8K: grade-school math word problems in English. It will not generalize to other math styles (geometry, calculus, proofs) and is not a general-purpose chat model.
- The 10-prompt evaluation set is small; the headline score is a directional signal, not a benchmark.
- The base model is 0.5B parameters — useful for studying the SFT/GRPO pipeline at low cost, but well below the accuracy of larger reasoning models.
Citation
This adapter was produced as part of NLP Assignment 4 at IBA. The full pipeline, hyperparameter sweep tables, and LaTeX report live in the source repository.
Framework versions
- PEFT 0.13.x
- transformers 4.4x.x
- TRL 0.13.x–0.14.x
Model provider
tayyib-sayyid
Model tree
Base
Qwen/Qwen2.5-0.5B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information