teckedd

whisper-small-waxal-akan-continuation-v1

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Results

The primary comparison uses all 1,522 rows in the original held-out Waxal test split. The test set is excluded from training and checkpoint selection.

Table
ModelWERCER
Published baseline33.84%12.74%
This candidate32.77%12.47%

A 5,000-sample paired bootstrap estimates a 99.86% probability that the candidate improves the baseline. The 95% interval for candidate-minus-baseline WER is -1.90 to -0.33 percentage points. The candidate improves 524 utterances, ties 603, and worsens 395.

The smaller frozen benchmark v1 contains 99 utterances, three from each of 33 test-only speakers. On that subset, WER moves from 33.62% to 32.65% and CER from 12.37% to 12.26%. Its paired interval crosses zero, which is why the full held-out comparison is reported above.

On the complete 1,123-row validation split, WER improved from 32.69% before continuation to 31.45% at the selected step-300 checkpoint. Step 400 regressed to 31.83%, so the trainer restored step 300.

The stored Yoruba proxy prefix and no-forced-language decoding produced identical frozen benchmark predictions. This does not imply native Yoruba or Akan support in Whisper's original language-token inventory.

Training

  • Base: teckedd/whisper_small-waxal_akan-asr-v1
  • Dataset: google/WaxalNLP, aka_asr
  • Valid rows: 10,106 train and 1,123 validation
  • Labels: NFC, lowercase, punctuation removed, whitespace collapsed
  • Method: full-model continuation, FP16, gradient checkpointing
  • Hardware: NVIDIA L4 on Modal
  • Effective batch size: 32
  • Learning rate: 5e-6 with 50 warmup steps
  • Validation/save interval: 100 steps
  • Selected checkpoint: step 300 of 400

Intended Use

Research, evaluation, and prototyping for Akan speech recognition. Domain-specific systems for health, commerce, customer support, or public services require additional representative data, human review, privacy controls, and safety testing.

Limitations

  • Trained on one public corpus and may not generalize across Akan dialects, domains, ages, recording devices, background noise, or code-switching.
  • The measured full-test improvement is statistically credible but modest.
  • Two of 1,522 candidate outputs entered severe token-repetition loops. Production inference should detect repeated-token collapse and retry or fall back to a reviewed baseline.
  • Whisper's tokenizer fragments Akan text heavily; this run does not change the vocabulary.
  • Do not use unreviewed transcripts for medical, legal, financial, or emergency decisions.

Training and evaluation code: https://github.com/teckedd-code2save/akan-speech-lab

Model provider

teckedd

Model tree

Base

teckedd/whisper_small-waxal_akan-asr-v1

Fine-tuned

this model

Modalities

Input

Audio

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today