Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Quick start

PEFT

python

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-32B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, "svyatsharov/Role-play-ai")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-32B-Instruct")
messages = [
{"role": "system", "content": "You are Mira, a warm tavern owner. Witty but firm."},
{"role": "user", "content": "*sits at the bar* Tough day."},
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True,
add_generation_prompt=True,
return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=300, temperature=0.85, top_p=0.9, do_sample=True)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))

Unsloth (быстрее на 1 GPU)

python

from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="svyatsharov/Role-play-ai",
max_seq_length=4096,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

Model details

ПараметрЗначение
Base modelQwen/Qwen2.5-32B-Instruct
Adapter typeLoRA (PEFT) r=64, alpha=128
Trainable params537M (1.6% от 32B)
Context length6144
LanguagesEnglish, Russian
Chat templateChatML
LicenseApache 2.0 (наследуется от Qwen2.5)

Evaluation results

Метрики посчитаны на eval-сете (1203 примера, 5% от полного датасета). Все 5 групп метрик:

Loss-based

МетрикаЗначениеЦель
Perplexity (overall)3.315–12
Perplexity (EN)3.245–12
Perplexity (RU)3.585–15
Token accuracy0.678>0.55 ✅

Reference-based

МетрикаЗначениеЦель
BLEU-411.20>5 ✅
ROUGE-L0.215>0.20 ✅
BERTScore F10.865>0.85 ✅
chrF++31.01>25 ✅

Style match (Albert metrics)

МетрикаЗначениеЦель
Length JS-divergence0.038<0.10 ✅
Vocabulary overlap0.94>0.55 ✅
Style Match Score0.798>0.7 ✅

Diversity

МетрикаЗначениеЦель
Avg distinct-20.940>0.7 ✅
Avg distinct-30.985
Self-repetition rate0.000<0.05 ✅
TTR (model)0.3490.4–0.6

Caveats

  • PPL 3.31 ниже целевых 5–12 — eval-сет это случайные 5% из тех же источников что и train. На out-of-distribution данных PPL будет выше. Не настоящее переобучение: train_loss = 1.20, eval_loss = 1.21.
  • Русский в eval недопредставлен — всего 14 RU-примеров из 1203. Метрика PPL_ru статистически слабая.

Training data

Целевой объём: 50k диалогов SFW roleplay. Реально получено 24 071.

ИсточникПолучено%Заявлено
PygmalionAI/PIPPA (SFW filter)13 05054.2%35%
lemonilia/LimaRP00%25%
Norquinal/claude_multiround_chat_30k7 50031.2%15%
IlyaGusev/saiga_scored3 52114.6%25%

LimaRP не загрузился при подготовке датасета — fallback увеличил долю PIPPA.

Training procedure

ПараметрЗначение
FrameworkUnsloth + TRL SFTTrainer
MethodQLoRA 4-bit (NF4)
Effective batch16 (2 × 8 grad_accum)
Epochs3 (early stopping)
Learning rate1e-4, cosine schedule, warmup 3%
OptimizerAdamW 8-bit
Weight decay0.01
Gradient checkpointingUnsloth
Precisionbf16 + tf32
Target modulesq_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Steps & duration

  • Total steps: 4287
  • Train duration: 51 час на 1× A100 80GB
  • Peak VRAM: 33 ГБ
  • Final train loss: 1.20
  • Final eval loss: 1.21

Hardware

  • GPU: 1× NVIDIA A100 80GB
  • RAM: 200 GB
  • Storage: 4 TB
  • VRAM лимит: 60 GB (использовано 33 GB)

Limitations

  • Английский лучше русского из-за дисбаланса данных
  • 32B модель требует ≥24 GB VRAM для инференса в 4-bit
  • Контекст 6144 токенов — длинные RP-сессии надо обрезать
  • Eval-сет близок к train → реальное качество на новых данных вероятно ниже метрик
  • LimaRP не вошёл в обучение — состав датасета смещён

Citation

bibtex

@misc{role-play-ai-2026,
title = {Role-play AI: Qwen2.5-32B fine-tune for bilingual SFW roleplay},
author = {svyatsharov, ichinosekei},
year = {2026},
url = {https://huggingface.co/svyatsharov/Role-play-ai}
}

Base model citation:

bibtex

@misc{qwen2.5,
title = {Qwen2.5: A Party of Foundation Models},
author = {Qwen Team},
year = {2024},
url = {https://huggingface.co/Qwen/Qwen2.5-32B-Instruct}
}

Framework versions

  • PEFT 0.19.1
  • Transformers ≥4.46.0
  • TRL ≥0.12.0
  • Unsloth
  • PyTorch 2.5.1

Model provider

svyatsharov

Model tree

Base

Qwen/Qwen2.5-32B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today