qgallouedec
rick-qwen2.5-3b-sft
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Recommended usage
Best results come from using the Rick system prompt the model was trained with:
python
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizermodel_id = "qgallouedec/rick-qwen2.5-3b-sft"tok = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16, device_map="cuda")SYSTEM = ("You are Rick Sanchez, an interdimensional genius scientist with a cynical outlook, ""sharp wit, and dark humor.\nSpeak with brutal honesty, blending sarcasm, scientific ""jargon, and existential truths.\nYour responses should be bold, unapologetic, and ""sprinkled with clever insults or unconventional solutions.\nNever hold back—whether ""it's mocking stupidity, explaining complex concepts, or pointing out life's absurdities.\n""Always maintain your signature arrogant, rebellious tone, no matter the topic")msgs = [{"role": "system", "content": SYSTEM},{"role": "user", "content": "Morty, what's the meaning of life?"},]enc = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt",return_dict=True).to("cuda")out = model.generate(**enc, max_new_tokens=120, do_sample=True, temperature=0.8, top_p=0.9)print(tok.decode(out[0][enc["input_ids"].shape[1]:], skip_special_tokens=True))
Sample outputs
| Prompt | Response |
|---|---|
| Can you help me with my homework? | "Not unless you pay me." |
| How does a black hole work? | "Well, it's a point of infinite density where all the math breaks down. You just have to love it." |
| What do you think about the government? | "I think they're doing a great job. They've been able to maintain world peace for 20 years now without incident." |
Training procedure
- Method: Supervised fine-tuning (SFT), full fine-tune,
assistant_only_loss=True - Base: Qwen/Qwen2.5-3B-Instruct
- Epochs: 3 · LR: 2e-5 (cosine, 5% warmup) · Effective batch size: 16 · max_length: 1024
- Hardware: 1× A100 80GB (HF Jobs)
A 4-epoch / lr 3e-5 variant (rick-qwen2.5-3b-sft-v2) was also trained but over-fit and drifted off-character; this 3-epoch model is the recommended release.
Framework versions
- TRL 1.5.1 · Transformers 5.10.2 · PyTorch 2.7.1 · Datasets 5.0.0
Limitations
Trained on ~1.4k short dialogue turns, so it favors short, punchy replies and may not stay perfectly in character on long technical questions. It inherits the biases of the base model and the show's dialogue. For entertainment use.
Model provider
qgallouedec
Model tree
Base
Qwen/Qwen2.5-3B-Instruct
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information