HossamRizk
Temsah-TTS
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model details
| Base model | unsloth/Spark-TTS-0.5B (Qwen2.5-0.5B LLM + BiCodec) |
| Task | Text-to-speech (Egyptian Arabic, single speaker) |
| Fine-tune type | Full fine-tune of the LLM only (BiCodec frozen) |
| Language | Arabic — Egyptian dialect (not MSA) |
| Output | 16 kHz mono waveform |
| Cloning | Zero-shot — speaker timbre comes from a reference clip at inference time |
Training
- Framework: Unsloth + TRL
SFTTrainer, full fine-tuning in float32. - Data: single-speaker Egyptian Arabic, filtered to 1–45 s clips → 52,495 clips (51,682 train / 813 validation), leakage-safe split by source video.
- Text normalization: diacritics (tashkeel) stripped, numbers verbalized, non-speech / laugh / sigh clips dropped, vocal pause/breath markers stripped. The same cleaning must be applied to inference text for in-distribution results.
- Hyperparameters: 2 epochs (3,232 steps), effective batch size 32
(
per_device=4 × grad_accum=8), learning rate1e-5,max_seq_length=3072,group_by_length=True, linear schedule,adamw_8bit. - Result: validation loss tracks training loss throughout (≈5.17 at the end, still decreasing) — no overfitting observed.
Usage
This model is the LLM stage of a Spark-TTS pipeline. Assemble a full Spark model directory, then run inference with the Spark-TTS code.
bash
git clone https://github.com/SparkAudio/Spark-TTSpip install -r Spark-TTS/requirements.txt soundfile
python
import os, shutil, sys, torch, soundfile as sffrom huggingface_hub import snapshot_download# 1. base model -> BiCodec + wav2vec2 + config.yamlbase = snapshot_download("unsloth/Spark-TTS-0.5B")# 2. this fine-tuned LLMllm = snapshot_download("HossamRizk/Temsah-TTS")# 3. assemble a Spark-servable model dir (base layout, our LLM swapped in)root = "Temsah-TTS-spark"shutil.rmtree(root, ignore_errors=True); os.makedirs(root)shutil.copy(f"{base}/config.yaml", f"{root}/config.yaml")shutil.copytree(f"{base}/BiCodec", f"{root}/BiCodec")shutil.copytree(f"{base}/wav2vec2-large-xlsr-53", f"{root}/wav2vec2-large-xlsr-53")shutil.copytree(llm, f"{root}/LLM")# 4. synthesize (needs a reference clip of the target speaker for the voice)sys.path.append("Spark-TTS")from cli.SparkTTS import SparkTTStts = SparkTTS(root, device="cuda:0" if torch.cuda.is_available() else "cpu")wav = tts.inference("النهارده هنتكلم عن موضوع مهم جدا يخص كل واحد فينا",prompt_speech_path="reference_clip.wav", # a few seconds of the target speaker)sf.write("output.wav", wav, 16000)
Tip: clean the input text the same way training did (strip diacritics, verbalize numbers, drop non-speech tags) so it matches the training distribution.
Limitations & intended use
- Single voice. Designed to reproduce one speaker. The reference clip supplies the timbre; using a different speaker's clip will not sound like the trained voice.
- Egyptian dialect. Trained on Egyptian Arabic; MSA or other dialects are out of scope.
- Audio band. Source audio is 16 kHz / band-limited (web sources) → clear but not studio-bright output.
- Ethical use / consent. This is a clone of a specific person's voice. Only use it with the speaker's consent and in line with the source dataset's terms. Do not use it to impersonate, deceive, or generate misleading content.
License
Released under apache-2.0 (inherited from the base model). Review this against the
source dataset's terms and the speaker's wishes before redistribution — change if needed.
Acknowledgements
- Spark-TTS (SparkAudio) and
unsloth/Spark-TTS-0.5B - Unsloth for fast fine-tuning
- Source data:
oddadmix/arabic-audio-collection-mohamed-khairy
Model provider
HossamRizk
Model tree
Base
unsloth/Spark-TTS-0.5B
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information