Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Pipeline
markdown
Qwen/Qwen3-0.6B (base)↓ SFT with LoRAqwen3-0.6b-sft-dolly↓ DPOqwen3-0.6b-dpo-ultrafeedback (this model)
DPO Configuration
| Parameter | Value |
|---|---|
| Beta | 0.3 |
| Learning Rate | 5e-7 |
| Epochs | 2 |
| Batch Size | 1 x 8 accumulation |
| Max Length | 256 |
| Optimizer | adamw_8bit |
| Quantization | 4-bit NF4 |
Dataset
- Name: HuggingFaceH4/ultrafeedback_binarized
- Split: train_prefs
- Subset: 2,000 samples (seed=42)
- Format: prompt / chosen / rejected pairs
Results
| Stage | BLEU | BERTScore F1 |
|---|---|---|
| Baseline (no tuning) | 3.70 | 0.7675 |
| After SFT | 10.22 | 0.8149 |
| After DPO (this) | 4.82 | 0.7841 |
Note on Metrics
DPO scores lower than SFT on BLEU/BERTScore. This is expected — DPO optimizes for preference alignment, not surface-level n-gram overlap. The model produces more natural, measured responses.
How to Use
python
from transformers import AutoTokenizer, AutoModelForCausalLMfrom peft import PeftModelimport torchtokenizer = AutoTokenizer.from_pretrained("Shaheer05/qwen3-0.6b-dpo-ultrafeedback",trust_remote_code=True)# Load SFT first, then DPO on topbase = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B",torch_dtype=torch.bfloat16,device_map="auto",trust_remote_code=True)sft_model = PeftModel.from_pretrained(base,"Shaheer05/qwen3-0.6b-sft-dolly")sft_model = sft_model.merge_and_unload()model = PeftModel.from_pretrained(sft_model,"Shaheer05/qwen3-0.6b-dpo-ultrafeedback")# Inferencemessages = [{"role": "user", "content": "What is machine learning?"}]text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = tokenizer(text, return_tensors="pt").to(model.device)output = model.generate(**inputs, max_new_tokens=200, do_sample=False)print(tokenizer.decode(output[0], skip_special_tokens=True))
Training Platform
- Google Colab (Free Tier) — NVIDIA T4 GPU
Model provider
Shaheer05
Model tree
Base
Shaheer05/qwen3-0.6b-sft-dolly
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information