Kerassy/qwen-2.5-3b-smoltalk-sft API & Inference Endpoint

Model Details

Base Model: Qwen/Qwen2.5-3B
Fine-tuning Dataset: HuggingFaceTB/smoltalk (everyday-conversations subset)
Methodology: Supervised Fine-Tuning (SFT) using TRL
Hardware Used: 1 x NVIDIA L4 GPU (24GB VRAM)

How to Get Started

You can load and use this model directly with the Hugging Face pipeline API.

python
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

MODEL_ID = "Kerassy/qwen-2.5-3b-smoltalk-sft"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, 
    torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
    device_map="auto"
)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

messages = [
    {"role": "user", "content": "Why is the sky blue?"}
]

formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = pipe(
    formatted_prompt, 
    max_new_tokens=128, 
    do_sample=True, 
    temperature=0.7,
    top_k=40,
    clean_up_tokenization_spaces=False,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.encode("<|end|>")[0] if "<|end|>" in tokenizer.get_vocab() else tokenizer.eos_token_id
)

print(outputs[0]['generated_text'])

qwen-2.5-3b-smoltalk-sft

Get help setting up a custom Dedicated Endpoints.

README

Model Details

How to Get Started

Explore FriendliAI today

qwen-2.5-3b-smoltalk-sft