teckedd
whisper-small-waxal-akan-continuation-v1
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Results
The primary comparison uses all 1,522 rows in the original held-out Waxal test split. The test set is excluded from training and checkpoint selection.
| Model | WER | CER |
|---|---|---|
| Published baseline | 33.84% | 12.74% |
| This candidate | 32.77% | 12.47% |
A 5,000-sample paired bootstrap estimates a 99.86% probability that the candidate improves the baseline. The 95% interval for candidate-minus-baseline WER is -1.90 to -0.33 percentage points. The candidate improves 524 utterances, ties 603, and worsens 395.
The smaller frozen benchmark v1 contains 99 utterances, three from each of 33 test-only speakers. On that subset, WER moves from 33.62% to 32.65% and CER from 12.37% to 12.26%. Its paired interval crosses zero, which is why the full held-out comparison is reported above.
On the complete 1,123-row validation split, WER improved from 32.69% before continuation to 31.45% at the selected step-300 checkpoint. Step 400 regressed to 31.83%, so the trainer restored step 300.
The stored Yoruba proxy prefix and no-forced-language decoding produced identical frozen benchmark predictions. This does not imply native Yoruba or Akan support in Whisper's original language-token inventory.
Training
- Base:
teckedd/whisper_small-waxal_akan-asr-v1 - Dataset:
google/WaxalNLP,aka_asr - Valid rows: 10,106 train and 1,123 validation
- Labels: NFC, lowercase, punctuation removed, whitespace collapsed
- Method: full-model continuation, FP16, gradient checkpointing
- Hardware: NVIDIA L4 on Modal
- Effective batch size: 32
- Learning rate: 5e-6 with 50 warmup steps
- Validation/save interval: 100 steps
- Selected checkpoint: step 300 of 400
Intended Use
Research, evaluation, and prototyping for Akan speech recognition. Domain-specific systems for health, commerce, customer support, or public services require additional representative data, human review, privacy controls, and safety testing.
Limitations
- Trained on one public corpus and may not generalize across Akan dialects, domains, ages, recording devices, background noise, or code-switching.
- The measured full-test improvement is statistically credible but modest.
- Two of 1,522 candidate outputs entered severe token-repetition loops. Production inference should detect repeated-token collapse and retry or fall back to a reviewed baseline.
- Whisper's tokenizer fragments Akan text heavily; this run does not change the vocabulary.
- Do not use unreviewed transcripts for medical, legal, financial, or emergency decisions.
Training and evaluation code: https://github.com/teckedd-code2save/akan-speech-lab
Model provider
teckedd
Model tree
Base
teckedd/whisper_small-waxal_akan-asr-v1
Fine-tuned
this model
Modalities
Input
Audio
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information