teckedd

whisper-small-waxal-akan-continuation-v1

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Results

The primary comparison uses all 1,522 rows in the original held-out Waxal test split. The test set is excluded from training and checkpoint selection.

Table
Model	WER	CER
Published baseline	33.84%	12.74%
This candidate	32.77%	12.47%

A 5,000-sample paired bootstrap estimates a 99.86% probability that the candidate improves the baseline. The 95% interval for candidate-minus-baseline WER is -1.90 to -0.33 percentage points. The candidate improves 524 utterances, ties 603, and worsens 395.

The smaller frozen benchmark v1 contains 99 utterances, three from each of 33 test-only speakers. On that subset, WER moves from 33.62% to 32.65% and CER from 12.37% to 12.26%. Its paired interval crosses zero, which is why the full held-out comparison is reported above.

On the complete 1,123-row validation split, WER improved from 32.69% before continuation to 31.45% at the selected step-300 checkpoint. Step 400 regressed to 31.83%, so the trainer restored step 300.

The stored Yoruba proxy prefix and no-forced-language decoding produced identical frozen benchmark predictions. This does not imply native Yoruba or Akan support in Whisper's original language-token inventory.

Training

Base: teckedd/whisper_small-waxal_akan-asr-v1
Dataset: google/WaxalNLP, aka_asr
Valid rows: 10,106 train and 1,123 validation
Labels: NFC, lowercase, punctuation removed, whitespace collapsed
Method: full-model continuation, FP16, gradient checkpointing
Hardware: NVIDIA L4 on Modal
Effective batch size: 32
Learning rate: 5e-6 with 50 warmup steps
Validation/save interval: 100 steps
Selected checkpoint: step 300 of 400

Intended Use

Research, evaluation, and prototyping for Akan speech recognition. Domain-specific systems for health, commerce, customer support, or public services require additional representative data, human review, privacy controls, and safety testing.

Limitations

Trained on one public corpus and may not generalize across Akan dialects, domains, ages, recording devices, background noise, or code-switching.
The measured full-test improvement is statistically credible but modest.
Two of 1,522 candidate outputs entered severe token-repetition loops. Production inference should detect repeated-token collapse and retry or fall back to a reviewed baseline.
Whisper's tokenizer fragments Akan text heavily; this run does not change the vocabulary.
Do not use unreviewed transcripts for medical, legal, financial, or emergency decisions.

Training and evaluation code: https://github.com/teckedd-code2save/akan-speech-lab

Model provider

teckedd

Model tree

Base

teckedd/whisper_small-waxal_akan-asr-v1

Fine-tuned

this model

Modalities

Input

Audio

Output