pr0mila-gh0sh

pr0mila-gh0sh

MediBeng-Whisper-Tiny-FL

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Description

MediBeng Whisper Tiny FL is the federated learning (v3) release of Whisper Tiny for automatic speech translation of code-switched Bengali–English clinical conversations into English. Training uses FedProx across 4 simulated hospital clients with speaker-based non-IID data partitions — raw audio never leaves each client; only aggregated weight updates are shared.

This release is the best-performing FL checkpoint (FedProx speaker). It substantially outperforms centralised v2 under the same free-generation evaluation protocol.

What’s New in v3 (Federated Learning)

Table
Areav2 (centralised)v3 FL (this release)
Training paradigmSingle-server centralised fine-tuningFederated learning (4 clients, 3 rounds)
PrivacyAll training data pooled centrallyRaw audio stays on clients; only weight updates aggregated
AlgorithmsSeq2SeqTrainer onlyFedAvg + FedProx (this model: FedProx)
Client partitioningN/ASpeaker non-IID (Male/Female hospital shards)
Non-IID handlingN/AFedProx proximal term (μ = 0.01) reduces client drift
Local training500 central steps50 steps/round × 3 rounds × 4 clients = 600 client steps
Eval protocolFree + forced generation (v2 discrepancy)Free generation (canonical); free/forced consistent
Best test WER28.20% (centralised v2, free gen)3.12% (FedProx speaker, free gen)
Best test BLEU73.0696.33
Statistical testingBootstrap / McNemar vs baselineFull suite for FL vs baseline and vs centralised v2

Federated architecture

markdown

┌─────────────────────────┐
│ FL Server (Aggregator) │
│ FedProx (μ = 0.01) │
└────────────┬────────────┘
│ broadcast global weights
┌─────────────────────┼─────────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ Client 0 │ │ Client 1 │ ... │ Client 3 │
│ Male shard │ │ Male shard │ │ Female shard│
│ 863 samples │ │ 863 samples │ │ 843 samples │
│ 50 local │ │ 50 local │ │ 50 local │
│ steps/round │ │ steps/round │ │ steps/round │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
└─────────────────────┼─────────────────────┘
│ upload weight updates
┌────────────▼────────────┐
│ Weighted Aggregation │
└─────────────────────────┘

Training configuration

Table
ParameterValue
Base checkpointopenai/whisper-tiny
AlgorithmFedProx (μ = 0.01)
Clients4 (speaker non-IID)
FL rounds3
Local steps per round50
Local learning rate1e-5
Local batch size1
OptimizerAdamW
Total FL training time~21 min (CPU)

Usage

Install dependencies:

bash

pip install transformers librosa torch datasets

Run inference (free generation — matches evaluation protocol):

python

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa
MODEL_ID = "pr0mila-gh0sh/MediBeng-Whisper-Tiny-FL"
processor = WhisperProcessor.from_pretrained(MODEL_ID)
model = WhisperForConditionalGeneration.from_pretrained(MODEL_ID)
# Free generation (canonical FL evaluation protocol)
model.config.forced_decoder_ids = None
model.generation_config.forced_decoder_ids = None
audio, _ = librosa.load("path_to_audio.wav", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features, max_length=225)
translation = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print("Translation:", translation)

Intended Use

For researchers and developers building privacy-preserving clinical AST systems:

  • Multi-hospital deployment without centralising patient audio
  • Federated fine-tuning research on code-switched clinical speech
  • Bengali–English medical translation in regulated environments

Training Data

Fine-tuned via federated learning on the MediBeng training split, partitioned across 4 simulated clients:

Table
ClientPartitionSamples
0Male speaker shard A863
1Male speaker shard B863
2Female speaker shard A843
3Female speaker shard B843

Test evaluation: held-out 960-sample HF test split (identical to v2).

Evaluation Results

Full test set (n = 960, free generation)

Table
SettingWER ↓BLEU-4 ↑chrF++ ↑
Baseline (Whisper Tiny, unfine-tuned)81.12%32.2045.12
Centralised v228.20%73.0679.28
FL FedAvg IID6.16%93.6694.38
FL FedProx speaker (this model)3.12%96.3396.70

FedProx convergence (quick eval, n = 200 per round)

Table
RoundQuick WERQuick BLEUQuick chrF++
136.64%57.9463.87
27.02%92.2593.39
33.09%96.3596.71

Statistical significance (FedProx vs baseline)

Table
TestResult
Bootstrap 95% CI (WER)[2.41%, 3.46%]
Paired t-testp = 6.95 × 10⁻¹⁶
McNemar (≤5% WER)840/960 FL correct vs 0/960 baseline
Effect size (Cohen's d)0.27

Limitations

  • Simulated clients — partitions use synthetic TTS speaker labels, not real multi-hospital data.
  • 3 FL rounds — convergence may improve with more rounds.
  • Centralised upper bound — comparison vs under-trained centralised v2; fully converged centralised baseline pending.
  • Full weight transmission — ~156 MB per client per round; LoRA-FL not yet implemented.
  • No differential privacy — formal ε–δ guarantees not yet added.

Ethical Considerations

  • Federated learning reduces raw-data exposure but does not eliminate all privacy risks (model inversion, membership inference).
  • Training data may reflect demographic biases; validate before clinical deployment.
  • Human review required for all clinical translations.

Blog Post

MediBeng Whisper-Tiny: Translating Code-Switched Bengali-English Speech for Healthcare

Citation

Preprint on medRxiv.

bibtex

@article{ghosh2025medibeng,
title={MediBeng Whisper Tiny: A fine-tuned code-switched Bengali-English translator for clinical applications},
author={Ghosh, Promila and Talukder, Sunipun},
journal={medRxiv},
year={2025},
doi={https://doi.org/10.1101/2025.04.25.25326406},
url={https://www.medrxiv.org/content/10.1101/2025.04.25.25326406v2}
}

Model provider

pr0mila-gh0sh

pr0mila-gh0sh

Model tree

Base

openai/whisper-tiny

Fine-tuned

this model

Modalities

Input

Audio

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today