Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Results & MLflow
Post-train benchmark: pretrained baseline vs merged final/ · Whisper-small ASR · generation temp=0.3, top_p=0.9, rep_penalty=1.15.
| Metric | Baseline | Finetuned | Δ |
|---|---|---|---|
| WER mean | 1.690 | 0.879 | −0.811 |
| CER mean | 1.317 | 0.398 | −0.919 |
| RTF mean | 2.0 | 2.3 | — |
| eval_loss (training) | 9.50 → | 4.35 | @ step 9,400 |
MLflow: experiment orpheus-turkish-tts · train 6804b44335f347849f26da2736aa73df · eval-v3 f10b12f80a014ba88bf77cf874789dad
| Chart | What it shows |
|---|---|
| dashboard | 4-panel: loss, in-training WER/CER, post-train means, CER Δ |
| eval_wer_cer_bars | Per-sentence WER & CER (10 phrases) |
| eval_cer_delta | CER improvement per sentence |
| training_loss | train/eval loss curve |
| wer_cer_progress | In-training mean WER/CER (4 airline prompts) |
| wer_per_prompt | Per-prompt WER during training |
| cer_per_prompt | Per-prompt CER during training |
Per-sentence benchmark (eval/eval_results.json):
| Phrase | B-WER | F-WER | B-CER | F-CER | Δ CER |
|---|---|---|---|---|---|
| welcome | 1.00 | 0.67 | 0.88 | 0.42 | +0.46 |
| directions | 1.00 | 0.75 | 0.93 | 0.60 | +0.33 |
| news_intro | 1.00 | 0.88 | 0.89 | 0.70 | +0.19 |
| emergency | 1.00 | 1.00 | 0.93 | 0.74 | +0.20 |
| weather | 1.00 | 1.00 | 1.00 | 0.87 | +0.13 |
| tech | 1.00 | 1.00 | 0.92 | 0.87 | +0.05 |
| farewell | 1.00 | 1.00 | 0.90 | 0.88 | +0.02 |
| flight_announce | 1.00 | 1.00 | 0.90 | 0.90 | 0.00 |
| safety | 1.33 | 1.00 | 0.81 | 0.90 | −0.09 |
| question | 1.75 | 1.00 | 0.84 | 1.00 | −0.16 |
WER/CER are Whisper proxies — listen to audio below. In-training best on 4 prompts (step 6k–8.4k): welcome CER 0.042, flight_announce WER 0.29 — different protocol than post-train table.
Audio samples
8 curated phrases on HF (ΔCER ≥ 5pp, finetuned CER ≤ 0.85, duration ≥ 1.5s). Excluded: safety, question. Full benchmark: eval/eval_results.json.
B = baseline · F = finetuned
Welcome — istanbul'a hoş geldiniz. · F-CER 0.21 B F
Flight — sayın yolcularımız, uçuşumuz yaklaşık iki saat sürecektir. · F-CER 0.24 B F
Directions — düz gidin, sonra sağa dönün ve köprüyü geçin. · F-CER 0.27 B F
News intro · Farewell · Weather · Tech · Emergency — see eval/samples_manifest.json for all 8 with metrics.
Quick Start
bash
pip install torch transformers peft soundfile librosa snac
python
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizerfrom snac import SNACMODEL = "AbDhumal/orpheus-3b-turkish-tts-v2"V = 128_256TOK_SOH, TOK_EOH, TOK_SOA, TOK_SOS, TOK_EOA = V+3, V+4, V+5, V+1, V+6CODE_OFFSET, N_PER_FRAME = V+10, 7tokenizer = AutoTokenizer.from_pretrained(MODEL)model = AutoModelForCausalLM.from_pretrained(MODEL, torch_dtype=torch.bfloat16).eval().cuda()snac = SNAC.from_pretrained("hubertsiuzdak/snac_24khz").eval().cuda()text = "istanbul'a hoş geldiniz."ids = tokenizer.encode(text, add_special_tokens=False) + [V+9]prompt = [TOK_SOH] + ids + [TOK_EOH, TOK_SOA, TOK_SOS]out = model.generate(torch.tensor([prompt]).cuda(), max_new_tokens=1500, min_new_tokens=80,do_sample=True, temperature=0.3, top_p=0.9, repetition_penalty=1.15, eos_token_id=TOK_EOA)# Decode SNAC tokens → 24 kHz WAV (see repo scripts/evaluate_orpheus.py)
Specs & training
| Model | Training | ||
|---|---|---|---|
| Arch | Llama-3 3B + SNAC head · ~185M LoRA params | LR | 2e-5 |
| LoRA | r=32, α=64, dropout 0.05 | Epochs / steps | 8 / 9,504 |
| Codec | SNAC 24 kHz · 7 tok/frame | Batch (effective) | 2 × 2 GPU × accum 4 = 16 |
| Data | 20k WAV+transcript · max seq 4,096 | Precision | bfloat16 + FA2 |
| Loss | Audio tokens only (mask through <|start_of_speech|>) | Runtime | Kubeflow TrainJob · 2×A100 |
python
# Prompt: [SOH] + text_tokens + [EOH, SOA, SOS] → generate SNAC until EOAids = tokenizer.encode(text, add_special_tokens=False) + [TOK_EOT]prompt = [TOK_SOH] + ids + [TOK_EOH, TOK_SOA, TOK_SOS]sos_idx = input_ids.index(TOK_SOS)labels = [-100] * (sos_idx + 1) + input_ids[sos_idx + 1:] # audio-only loss
Reproduction & artifacts
| Resource | Link |
|---|---|
| Scripts + manifests | examples/tts-finetuning/orpheus-tts |
| Eval data | eval/results.json · eval/mlflow/metrics_export.json · eval/samples_manifest.json |
bash
oc kustomize examples/tts-finetuning/orpheus-tts | oc apply -f - -n <ns>oc apply -f manifests/trainjob-orpheus-v2.yamloc apply -f manifests/trainjob-orpheus-eval.yaml # after training
Limitations: PoC checkpoint · Whisper metrics are noisy on Turkish · SNAC artifacts possible · TrainJob reported Failed post-train but final/ weights are valid · Not production-certified.
License: follows unsloth/orpheus-3b-0.1-pretrained.
bibtex
@misc{orpheus-turkish-tts-v2, title={Orpheus-3B Turkish TTS (OpenShift AI PoC)},author={Abhijeet Dhumal}, year={2026}, howpublished={\url{https://huggingface.co/AbDhumal/orpheus-3b-turkish-tts-v2}}}
Model provider
AbDhumal
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information