Shaheer05/qwen3-0.6b-sft-dolly API & Inference Endpoint

Pipeline

markdown
Qwen/Qwen3-0.6B (base)
↓ SFT with LoRA
qwen3-0.6b-sft-dolly (this model)
↓ DPO
qwen3-0.6b-dpo-ultrafeedback

LoRA Configuration

Table with columns: Parameter, Value
Parameter	Value
Rank (r)	32
Alpha	64
Target Modules	q_proj, k_proj, v_proj, o_proj
Dropout	0.1
Learning Rate	3e-4
Epochs	2
Batch Size	2 x 4 accumulation
Optimizer	adamw_8bit
Quantization	4-bit NF4

Dataset

Name: databricks/databricks-dolly-15k
Subset: 3,000 samples (seed=42)
Format: Instruction + Context → Response

Results

Table with columns: Stage, BLEU, BERTScore F1
Stage	BLEU	BERTScore F1
Baseline (no tuning)	3.70	0.7675
This Model (SFT)	10.22	0.8149

How to Use

python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base + adapter
tokenizer = AutoTokenizer.from_pretrained(
    "Shaheer05/qwen3-0.6b-sft-dolly",
    trust_remote_code=True
)
base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-0.6B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
model = PeftModel.from_pretrained(
    base,
    "Shaheer05/qwen3-0.6b-sft-dolly"
)

# Inference
messages = [{"role": "user", "content": "What is machine learning?"}]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Training Platform

Google Colab (Free Tier) — NVIDIA T4 GPU

Pipeline

markdown
Qwen/Qwen3-0.6B (base)
↓ SFT with LoRA
qwen3-0.6b-sft-dolly (this model)
↓ DPO
qwen3-0.6b-dpo-ultrafeedback

LoRA Configuration

Table with columns: Parameter, Value
Parameter	Value
Rank (r)	32
Alpha	64
Target Modules	q_proj, k_proj, v_proj, o_proj
Dropout	0.1
Learning Rate	3e-4
Epochs	2
Batch Size	2 x 4 accumulation
Optimizer	adamw_8bit
Quantization	4-bit NF4

Dataset

Name: databricks/databricks-dolly-15k
Subset: 3,000 samples (seed=42)
Format: Instruction + Context → Response

Results

Table with columns: Stage, BLEU, BERTScore F1
Stage	BLEU	BERTScore F1
Baseline (no tuning)	3.70	0.7675
This Model (SFT)	10.22	0.8149

How to Use

python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base + adapter
tokenizer = AutoTokenizer.from_pretrained(
    "Shaheer05/qwen3-0.6b-sft-dolly",
    trust_remote_code=True
)
base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-0.6B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
model = PeftModel.from_pretrained(
    base,
    "Shaheer05/qwen3-0.6b-sft-dolly"
)

# Inference
messages = [{"role": "user", "content": "What is machine learning?"}]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Training Platform

Google Colab (Free Tier) — NVIDIA T4 GPU

qwen3-0.6b-sft-dolly

Get help setting up a custom Dedicated Endpoints.

README

Pipeline

LoRA Configuration

Dataset

Results

How to Use

Training Platform

Explore FriendliAI today

README

Pipeline

LoRA Configuration

Dataset

Results

How to Use

Training Platform