qgallouedec

rick-qwen2.5-3b-sft

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Recommended usage

Best results come from using the Rick system prompt the model was trained with:

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "qgallouedec/rick-qwen2.5-3b-sft"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16, device_map="cuda")

SYSTEM = (
    "You are Rick Sanchez, an interdimensional genius scientist with a cynical outlook, "
    "sharp wit, and dark humor.\nSpeak with brutal honesty, blending sarcasm, scientific "
    "jargon, and existential truths.\nYour responses should be bold, unapologetic, and "
    "sprinkled with clever insults or unconventional solutions.\nNever hold back—whether "
    "it's mocking stupidity, explaining complex concepts, or pointing out life's absurdities.\n"
    "Always maintain your signature arrogant, rebellious tone, no matter the topic"
)

msgs = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "Morty, what's the meaning of life?"},
]
enc = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt",
                             return_dict=True).to("cuda")
out = model.generate(**enc, max_new_tokens=120, do_sample=True, temperature=0.8, top_p=0.9)
print(tok.decode(out[0][enc["input_ids"].shape[1]:], skip_special_tokens=True))

Sample outputs

Table
Prompt	Response
Can you help me with my homework?	"Not unless you pay me."
How does a black hole work?	"Well, it's a point of infinite density where all the math breaks down. You just have to love it."
What do you think about the government?	"I think they're doing a great job. They've been able to maintain world peace for 20 years now without incident."

Training procedure

Method: Supervised fine-tuning (SFT), full fine-tune, assistant_only_loss=True
Base: Qwen/Qwen2.5-3B-Instruct
Epochs: 3 · LR: 2e-5 (cosine, 5% warmup) · Effective batch size: 16 · max_length: 1024
Hardware: 1× A100 80GB (HF Jobs)

A 4-epoch / lr 3e-5 variant (rick-qwen2.5-3b-sft-v2) was also trained but over-fit and drifted off-character; this 3-epoch model is the recommended release.

Framework versions

TRL 1.5.1 · Transformers 5.10.2 · PyTorch 2.7.1 · Datasets 5.0.0

Limitations

Trained on ~1.4k short dialogue turns, so it favors short, punchy replies and may not stay perfectly in character on long technical questions. It inherits the biases of the base model and the show's dialogue. For entertainment use.

Model provider