UzairKhiiba

Qwen2.5-7B-sft-dpo-tuned

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

Model Selection

The final model was selected from 5 SFT trials and 5 DPO trials using BLEU and BERTScore F1. Since the task is open-ended instruction following, BERTScore was treated as the primary metric because it better captures semantic similarity than exact word overlap.

Table
ModelBLEUBERTScore F1
Raw baseline9.34760.8052
Best SFT: trial 314.69220.8313
Best DPO: trial 512.49040.8315

The final selected model is dpo_trial_5, which achieved the highest BERTScore F1 across all evaluated runs.

Training Summary

Supervised Fine-Tuning

  • Dataset: HuggingFaceH4/no_robots
  • Base model: Qwen/Qwen2.5-7B
  • Best SFT run: trial_3
  • LoRA rank: 32
  • LoRA alpha: 64
  • LoRA dropout: 0.05
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Learning rate: 1e-4
  • Effective batch size: 8
  • Epochs: 2
  • Max sequence length: 512

Direct Preference Optimization

  • Dataset: Anthropic/hh-rlhf
  • Best DPO run: dpo_trial_5
  • SFT base adapter: trial_3
  • DPO beta: 0.1
  • Learning rate: 5e-5
  • Effective batch size: 8
  • Epochs: 2
  • Max sequence length: 512

Key Findings

  • SFT improved the raw base model substantially, raising BERTScore F1 from 0.8052 to 0.8313.
  • SFT trial_3 was the strongest supervised model by BERTScore F1.
  • DPO trial_5 gave the best overall semantic score, reaching 0.8315 BERTScore F1.
  • BLEU and BERTScore did not always rank models the same way; BERTScore was more useful for evaluating open-ended generated answers.
  • Conservative DPO settings worked best. The selected DPO run used beta=0.1, which preserved instruction-following quality while improving alignment.

Usage

python

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
repo_id = "UzairKhiiba/Qwen2.5-7B-sft-dpo-tuned"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
device_map="auto",
torch_dtype=torch.float16,
trust_remote_code=True,
)
messages = [
{"role": "user", "content": "Explain supervised learning in simple terms."}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

This model is intended for academic evaluation of instruction-following behavior after SFT and DPO tuning. It can be used for general response generation, explanatory prompts, reasoning-style prompts, and conversational assistant tasks.

Model provider

UzairKhiiba

Model tree

Base

Qwen/Qwen2.5-7B

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today