s-m-sharjeel/qwen2.5-0.5b-dolly-sft-lora API & Inference Endpoint

How to use

This is a LoRA adapter, not a standalone model. Load the base model first, then apply the adapter:

python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model_id = "Qwen/Qwen2.5-0.5B"
adapter_id    = "s-m-sharjeel/qwen2.5-0.5b-dolly-sft-lora"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base = AutoModelForCausalLM.from_pretrained(base_model_id, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

prompt = "### Instruction:\nWrite a short definition of machine learning.\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=150, do_sample=False,
                     pad_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Prompt format

The model was trained on an Alpaca-style template (the Dolly context field was mapped to ### Input). For best results, format prompts as:

markdown
### Instruction:
{your instruction here}

### Response:

If the task includes additional input/context, use:

markdown
### Instruction:
{your instruction here}

### Input:
{context here}

### Response:

Training details

Setting	Value
Base model	Qwen/Qwen2.5-0.5B
Method	Supervised Fine-Tuning (SFT) with LoRA
Dataset	databricks/databricks-dolly-15k (5,000-sample subset, seed=42)
Train / Validation split	4,500 / 500 (90/10)
LoRA rank (r)	32
LoRA alpha	64
Target modules	q_proj, v_proj, k_proj, o_proj
LoRA dropout	0.05
Learning rate	1e-4
LR scheduler	cosine (warmup ratio 0.05)
Epochs	2
Batch size	4 (gradient accumulation 4 → effective 16)
Max sequence length	512
Precision	fp16
Platform	Kaggle (NVIDIA T4 GPU)

Evaluation

Evaluated on a held-out set of 10 manually written instruction prompts with reference answers, using BLEU (sacreBLEU) and BERTScore F1. This configuration was selected as the best of 5 Dolly trials by combined BLEU + BERTScore (validation loss as tie-breaker).

Model	Mean BLEU	Mean BERTScore F1
Base Qwen2.5-0.5B	6.57	0.8854
Best Alpaca SFT (Trial 3)	7.27	0.8864
This model (Dolly SFT, Trial 5)	9.86	0.8803

This represents a +50.2% relative improvement in BLEU over the base model — the strongest result across both datasets in the study. The Dolly-tuned model produces more concise, human-like responses that align closely with reference answers in length and structure.

Frameworks

PEFT
TRL (SFTTrainer)
Transformers

Authors

Developed as a course assignment for NLP with Deep Learning, Institute of Business Administration (IBA), Karachi.

Name	ERP
Shazain	27115
Shayan	26289
Sharjeel	26932

License

Released under the Apache 2.0 license, matching the base model Qwen2.5-0.5B.

qwen2.5-0.5b-dolly-sft-lora

Get help setting up a custom Dedicated Endpoints.

README