Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Results & MLflow

Post-train benchmark: pretrained baseline vs merged final/ · Whisper-small ASR · generation temp=0.3, top_p=0.9, rep_penalty=1.15.

MetricBaselineFinetunedΔ
WER mean1.6900.879−0.811
CER mean1.3170.398−0.919
RTF mean2.02.3
eval_loss (training)9.50 →4.35@ step 9,400

MLflow: experiment orpheus-turkish-tts · train 6804b44335f347849f26da2736aa73df · eval-v3 f10b12f80a014ba88bf77cf874789dad

ChartWhat it shows
dashboard4-panel: loss, in-training WER/CER, post-train means, CER Δ
eval_wer_cer_barsPer-sentence WER & CER (10 phrases)
eval_cer_deltaCER improvement per sentence
training_losstrain/eval loss curve
wer_cer_progressIn-training mean WER/CER (4 airline prompts)
wer_per_promptPer-prompt WER during training
cer_per_promptPer-prompt CER during training

Per-sentence benchmark (eval/eval_results.json):

PhraseB-WERF-WERB-CERF-CERΔ CER
welcome1.000.670.880.42+0.46
directions1.000.750.930.60+0.33
news_intro1.000.880.890.70+0.19
emergency1.001.000.930.74+0.20
weather1.001.001.000.87+0.13
tech1.001.000.920.87+0.05
farewell1.001.000.900.88+0.02
flight_announce1.001.000.900.900.00
safety1.331.000.810.90−0.09
question1.751.000.841.00−0.16

WER/CER are Whisper proxies — listen to audio below. In-training best on 4 prompts (step 6k–8.4k): welcome CER 0.042, flight_announce WER 0.29 — different protocol than post-train table.

Audio samples

8 curated phrases on HF (ΔCER ≥ 5pp, finetuned CER ≤ 0.85, duration ≥ 1.5s). Excluded: safety, question. Full benchmark: eval/eval_results.json.

B = baseline · F = finetuned

Welcomeistanbul'a hoş geldiniz. · F-CER 0.21 B F

Flightsayın yolcularımız, uçuşumuz yaklaşık iki saat sürecektir. · F-CER 0.24 B F

Directionsdüz gidin, sonra sağa dönün ve köprüyü geçin. · F-CER 0.27 B F

News intro · Farewell · Weather · Tech · Emergency — see eval/samples_manifest.json for all 8 with metrics.

Quick Start

bash

pip install torch transformers peft soundfile librosa snac

python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from snac import SNAC
MODEL = "AbDhumal/orpheus-3b-turkish-tts-v2"
V = 128_256
TOK_SOH, TOK_EOH, TOK_SOA, TOK_SOS, TOK_EOA = V+3, V+4, V+5, V+1, V+6
CODE_OFFSET, N_PER_FRAME = V+10, 7
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForCausalLM.from_pretrained(MODEL, torch_dtype=torch.bfloat16).eval().cuda()
snac = SNAC.from_pretrained("hubertsiuzdak/snac_24khz").eval().cuda()
text = "istanbul'a hoş geldiniz."
ids = tokenizer.encode(text, add_special_tokens=False) + [V+9]
prompt = [TOK_SOH] + ids + [TOK_EOH, TOK_SOA, TOK_SOS]
out = model.generate(torch.tensor([prompt]).cuda(), max_new_tokens=1500, min_new_tokens=80,
do_sample=True, temperature=0.3, top_p=0.9, repetition_penalty=1.15, eos_token_id=TOK_EOA)
# Decode SNAC tokens → 24 kHz WAV (see repo scripts/evaluate_orpheus.py)

Specs & training

ModelTraining
ArchLlama-3 3B + SNAC head · ~185M LoRA paramsLR2e-5
LoRAr=32, α=64, dropout 0.05Epochs / steps8 / 9,504
CodecSNAC 24 kHz · 7 tok/frameBatch (effective)2 × 2 GPU × accum 4 = 16
Data20k WAV+transcript · max seq 4,096Precisionbfloat16 + FA2
LossAudio tokens only (mask through <|start_of_speech|>)RuntimeKubeflow TrainJob · 2×A100

python

# Prompt: [SOH] + text_tokens + [EOH, SOA, SOS] → generate SNAC until EOA
ids = tokenizer.encode(text, add_special_tokens=False) + [TOK_EOT]
prompt = [TOK_SOH] + ids + [TOK_EOH, TOK_SOA, TOK_SOS]
sos_idx = input_ids.index(TOK_SOS)
labels = [-100] * (sos_idx + 1) + input_ids[sos_idx + 1:] # audio-only loss

Reproduction & artifacts

bash

oc kustomize examples/tts-finetuning/orpheus-tts | oc apply -f - -n <ns>
oc apply -f manifests/trainjob-orpheus-v2.yaml
oc apply -f manifests/trainjob-orpheus-eval.yaml # after training

Limitations: PoC checkpoint · Whisper metrics are noisy on Turkish · SNAC artifacts possible · TrainJob reported Failed post-train but final/ weights are valid · Not production-certified.

License: follows unsloth/orpheus-3b-0.1-pretrained.

bibtex

@misc{orpheus-turkish-tts-v2, title={Orpheus-3B Turkish TTS (OpenShift AI PoC)},
author={Abhijeet Dhumal}, year={2026}, howpublished={\url{https://huggingface.co/AbDhumal/orpheus-3b-turkish-tts-v2}}}

Model provider

AbDhumal

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today