Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: mit

Model description

This model is a fine-tuned version of the Whisper Large V3 Turbo model, optimized for multilingual Automatic Speech Recognition (ASR). It has been trained on the ANV (Swivuriso) dataset to improve performance on specific target languages and domains represented in that corpus.

Whisper is a Transformer-based encoder-decoder model, also referred to as a sequence-to-sequence model. It was trained on weak supervision using large-scale noisy data, and this fine-tuning step adapts it specifically for the languages and accents found in the dsfsi-anv dataset.

Intended uses & limitations

Intended Uses

  • Automatic Speech Recognition (ASR): The model is primarily intended to transcribe audio in the languages present in the training data.
  • Research: Suitable for researchers studying low-resource language adaptation and fine-tuning efficiency.

Limitations

  • Hallucinations: Like the base Whisper model, this model may generate repetitive text or hallucinations, particularly in silence or with background noise.
  • Domain Specificity: Performance may degrade on audio that differs significantly (in terms of accent, noise, or recording quality) from the ANV dataset.

Training and evaluation data

The model was trained on the dsfsi-anv dataset.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: AdamW (betas=(0.9,0.98), epsilon=1e-08)
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 10,000
  • framework: PyTorch 2.9.1+cu128 / Transformers 4.57.3

Training results

EpochStepTraining LossValidation LossWERCER
0.110000.41080.57530.37020.1237
0.220000.23260.46530.28880.0881
0.330000.44290.37500.23540.0782
0.440000.33090.33880.20750.0674
0.550000.32980.31350.19520.0635
0.660000.32380.29290.17820.0592
0.770000.39260.27660.16880.0545
0.880000.22610.26270.15930.0519
0.990000.21970.25140.15730.0506
1.0100000.22760.24270.15010.0510

Usage

This model can be used with the Hugging Face transformers library via the pipeline class.

bash

pip install --upgrade pip
pip install --upgrade transformers datasets[audio] accelerate
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
# Load your fine-tuned model
model_id = "dsfsi-anv/multilingual-whisper-v3-turbo"
processor_id = "openai/whisper-large-v3-turbo"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(processor_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
torch_dtype=torch_dtype,
device=device,
)
# Example: Transcribe a sample file
# result = pipe("path/to/audio.wav")
# print(result["text"])

Framework versions

  • Transformers 4.57.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.4.1
  • Tokenizers 0.22.1

BibTeX entry and citation info

bibtex

@misc{radford2022whisper,
doi = {10.48550/ARXIV.2212.04356},
url = {[https://arxiv.org/abs/2212.04356](https://arxiv.org/abs/2212.04356)},
author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
title = {Robust Speech Recognition via Large-Scale Weak Supervision},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}

Model provider

more8467394

Model tree

Base

openai/whisper-large-v3-turbo

Fine-tuned

this model

Modalities

Input

Audio

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today