Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Results

StageMetricScore
Base model (no fine-tune)LLM-judge avg (1-5), 10 prompts2.0
This adapter (sft_trial_1)LLM-judge avg (1-5), 10 prompts3.4
Best GRPO trial on top of thisregex exact-match (0-10)2.0

Judge model: groq/llama-3.3-70b-versatile. Evaluation prompts are 10 held-out GSM8K test problems formatted with the ChatML template and the #### N terminator.

How to use

python

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_id = "Qwen/Qwen2.5-0.5B-Instruct"
adapter_id = "tayyib-sayyid/qwen2.5-0.5b-gsm8k-lora"
tok = AutoTokenizer.from_pretrained(adapter_id)
base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()
prompt = (
"Natalia sold clips to 48 of her friends in April, and then she sold "
"half as many clips in May. How many clips did Natalia sell altogether "
"in April and May?"
)
messages = [{"role": "user", "content": prompt}]
inputs = tok.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=256, do_sample=False)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

The model is trained to end every answer with #### <number> so a simple regex (r"####\s*(-?[\d,\.]+)") is enough to extract the final numeric answer.

Training details

  • Base model: Qwen/Qwen2.5-0.5B-Instruct
  • Dataset: openai/gsm8k (main config), 5,000-row training subset (seed 42), ChatML rendered with the #### N final-answer terminator.
  • Method: LoRA (PEFT) on top of 4-bit NF4 base via bitsandbytes.
  • Framework versions: PEFT 0.13.x, TRL SFTTrainer, transformers 4.4x.

LoRA hyperparameters

FieldValue
r8
alpha16
dropout0.05
target_modulesq_proj, v_proj
task_typeCAUSAL_LM

Optimization

FieldValue
learning rate0.0002
optimizerpaged_adamw_8bit
per-device batch size4
gradient accumulation4
effective batch size16
epochs1
warmup ratio0.03
weight decay0.0
max sequence length1024

Limitations

  • Trained and evaluated on GSM8K: grade-school math word problems in English. It will not generalize to other math styles (geometry, calculus, proofs) and is not a general-purpose chat model.
  • The 10-prompt evaluation set is small; the headline score is a directional signal, not a benchmark.
  • The base model is 0.5B parameters — useful for studying the SFT/GRPO pipeline at low cost, but well below the accuracy of larger reasoning models.

Citation

This adapter was produced as part of NLP Assignment 4 at IBA. The full pipeline, hyperparameter sweep tables, and LaTeX report live in the source repository.

Framework versions

  • PEFT 0.13.x
  • transformers 4.4x.x
  • TRL 0.13.x–0.14.x

Model provider

tayyib-sayyid

Model tree

Base

Qwen/Qwen2.5-0.5B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today