Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

How to use

This is a LoRA adapter, not a standalone model. Load the base model first, then apply the adapter:

python

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base_model_id = "Qwen/Qwen2.5-0.5B"
adapter_id = "s-m-sharjeel/qwen2.5-0.5b-dolly-sft-lora"
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base = AutoModelForCausalLM.from_pretrained(base_model_id, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()
prompt = "### Instruction:\nWrite a short definition of machine learning.\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=150, do_sample=False,
pad_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Prompt format

The model was trained on an Alpaca-style template (the Dolly context field was mapped to ### Input). For best results, format prompts as:

markdown

### Instruction:
{your instruction here}
### Response:

If the task includes additional input/context, use:

markdown

### Instruction:
{your instruction here}
### Input:
{context here}
### Response:

Training details

SettingValue
Base modelQwen/Qwen2.5-0.5B
MethodSupervised Fine-Tuning (SFT) with LoRA
Datasetdatabricks/databricks-dolly-15k (5,000-sample subset, seed=42)
Train / Validation split4,500 / 500 (90/10)
LoRA rank (r)32
LoRA alpha64
Target modulesq_proj, v_proj, k_proj, o_proj
LoRA dropout0.05
Learning rate1e-4
LR schedulercosine (warmup ratio 0.05)
Epochs2
Batch size4 (gradient accumulation 4 → effective 16)
Max sequence length512
Precisionfp16
PlatformKaggle (NVIDIA T4 GPU)

Evaluation

Evaluated on a held-out set of 10 manually written instruction prompts with reference answers, using BLEU (sacreBLEU) and BERTScore F1. This configuration was selected as the best of 5 Dolly trials by combined BLEU + BERTScore (validation loss as tie-breaker).

ModelMean BLEUMean BERTScore F1
Base Qwen2.5-0.5B6.570.8854
Best Alpaca SFT (Trial 3)7.270.8864
This model (Dolly SFT, Trial 5)9.860.8803

This represents a +50.2% relative improvement in BLEU over the base model — the strongest result across both datasets in the study. The Dolly-tuned model produces more concise, human-like responses that align closely with reference answers in length and structure.

Frameworks

  • PEFT
  • TRL (SFTTrainer)
  • Transformers

Authors

Developed as a course assignment for NLP with Deep Learning, Institute of Business Administration (IBA), Karachi.

NameERP
Shazain27115
Shayan26289
Sharjeel26932

License

Released under the Apache 2.0 license, matching the base model Qwen2.5-0.5B.

Model provider

s-m-sharjeel

Model tree

Base

Qwen/Qwen2.5-0.5B

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today