Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Overview
This model is a fine-tuned Whisper Small model for Bengali Automatic Speech Recognition (ASR) using 15 dB SNR noisy speech data.
Training Configuration
| Parameter | Value |
|---|---|
| Base Model | Whisper Small |
| Language | Bengali |
| Training Samples | 5,000 |
| Validation Samples | 500 |
| Sampling Rate | 16 kHz |
| Learning Rate | 1e-5 |
| Batch Size | 32 |
| Epochs | 10 |
Evaluation Results
| Step | Validation Loss | WER (%) | CER (%) |
|---|---|---|---|
| 100 | 0.6018 | 82.21 | 35.25 |
| 200 | 0.3004 | 71.96 | 26.54 |
| 300 | 0.2693 | 67.14 | 24.48 |
| 400 | 0.2851 | 64.73 | 23.89 |
| 500 | 0.3187 | 63.62 | 22.28 |
Selected Checkpoint
Checkpoint-300 was selected as the final model because it achieved the lowest validation loss (0.2693). After step 300, validation loss began to increase while training loss continued to decrease, indicating the onset of overfitting. Therefore, checkpoint-300 was chosen to ensure better generalization performance.
Intended Use
- Bengali Speech Recognition
- Academic Research
- Bengali ASR Benchmarking
- Noisy Speech Transcription (15 dB SNR)
Model provider
TurjoDutta5555
Model tree
Base
openai/whisper-small
Fine-tuned
this model
Modalities
Input
Audio
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information