Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0How to use
This is a LoRA adapter, not a standalone model. Load the base model first, then apply the adapter:
python
import torchfrom transformers import AutoTokenizer, AutoModelForCausalLMfrom peft import PeftModelbase_model_id = "Qwen/Qwen2.5-0.5B"adapter_id = "s-m-sharjeel/qwen2.5-0.5b-dolly-sft-lora"tokenizer = AutoTokenizer.from_pretrained(adapter_id)base = AutoModelForCausalLM.from_pretrained(base_model_id, torch_dtype=torch.float16, device_map="auto")model = PeftModel.from_pretrained(base, adapter_id)model.eval()prompt = "### Instruction:\nWrite a short definition of machine learning.\n\n### Response:\n"inputs = tokenizer(prompt, return_tensors="pt").to(model.device)out = model.generate(**inputs, max_new_tokens=150, do_sample=False,pad_token_id=tokenizer.eos_token_id)print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Prompt format
The model was trained on an Alpaca-style template (the Dolly context field was mapped to ### Input). For best results, format prompts as:
markdown
### Instruction:{your instruction here}### Response:
If the task includes additional input/context, use:
markdown
### Instruction:{your instruction here}### Input:{context here}### Response:
Training details
| Setting | Value |
|---|---|
| Base model | Qwen/Qwen2.5-0.5B |
| Method | Supervised Fine-Tuning (SFT) with LoRA |
| Dataset | databricks/databricks-dolly-15k (5,000-sample subset, seed=42) |
| Train / Validation split | 4,500 / 500 (90/10) |
| LoRA rank (r) | 32 |
| LoRA alpha | 64 |
| Target modules | q_proj, v_proj, k_proj, o_proj |
| LoRA dropout | 0.05 |
| Learning rate | 1e-4 |
| LR scheduler | cosine (warmup ratio 0.05) |
| Epochs | 2 |
| Batch size | 4 (gradient accumulation 4 → effective 16) |
| Max sequence length | 512 |
| Precision | fp16 |
| Platform | Kaggle (NVIDIA T4 GPU) |
Evaluation
Evaluated on a held-out set of 10 manually written instruction prompts with reference answers, using BLEU (sacreBLEU) and BERTScore F1. This configuration was selected as the best of 5 Dolly trials by combined BLEU + BERTScore (validation loss as tie-breaker).
| Model | Mean BLEU | Mean BERTScore F1 |
|---|---|---|
| Base Qwen2.5-0.5B | 6.57 | 0.8854 |
| Best Alpaca SFT (Trial 3) | 7.27 | 0.8864 |
| This model (Dolly SFT, Trial 5) | 9.86 | 0.8803 |
This represents a +50.2% relative improvement in BLEU over the base model — the strongest result across both datasets in the study. The Dolly-tuned model produces more concise, human-like responses that align closely with reference answers in length and structure.
Frameworks
- PEFT
- TRL (SFTTrainer)
- Transformers
Authors
Developed as a course assignment for NLP with Deep Learning, Institute of Business Administration (IBA), Karachi.
| Name | ERP |
|---|---|
| Shazain | 27115 |
| Shayan | 26289 |
| Sharjeel | 26932 |
License
Released under the Apache 2.0 license, matching the base model Qwen2.5-0.5B.
Model provider
s-m-sharjeel
Model tree
Base
Qwen/Qwen2.5-0.5B
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information