pr0mila-gh0sh/MediBeng-Whisper-Tiny-FL API & Inference Endpoint

Model Description

MediBeng Whisper Tiny FL is the federated learning (v3) release of Whisper Tiny for automatic speech translation of code-switched Bengali–English clinical conversations into English. Training uses FedProx across 4 simulated hospital clients with speaker-based non-IID data partitions — raw audio never leaves each client; only aggregated weight updates are shared.

This release is the best-performing FL checkpoint (FedProx speaker). It substantially outperforms centralised v2 under the same free-generation evaluation protocol.

What’s New in v3 (Federated Learning)

Table
Area	v2 (centralised)	v3 FL (this release)
Training paradigm	Single-server centralised fine-tuning	Federated learning (4 clients, 3 rounds)
Privacy	All training data pooled centrally	Raw audio stays on clients; only weight updates aggregated
Algorithms	Seq2SeqTrainer only	FedAvg + FedProx (this model: FedProx)
Client partitioning	N/A	Speaker non-IID (Male/Female hospital shards)
Non-IID handling	N/A	FedProx proximal term (μ = 0.01) reduces client drift
Local training	500 central steps	50 steps/round × 3 rounds × 4 clients = 600 client steps
Eval protocol	Free + forced generation (v2 discrepancy)	Free generation (canonical); free/forced consistent
Best test WER	28.20% (centralised v2, free gen)	3.12% (FedProx speaker, free gen)
Best test BLEU	73.06	96.33
Statistical testing	Bootstrap / McNemar vs baseline	Full suite for FL vs baseline and vs centralised v2

Federated architecture

markdown
┌─────────────────────────┐
                 │   FL Server (Aggregator) │
                 │   FedProx (μ = 0.01)     │
                 └────────────┬────────────┘
                              │ broadcast global weights
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
 ┌──────▼──────┐      ┌──────▼──────┐      ┌──────▼──────┐
 │  Client 0   │      │  Client 1   │ ...  │  Client 3   │
 │ Male shard  │      │ Male shard  │      │ Female shard│
 │ 863 samples │      │ 863 samples │      │ 843 samples │
 │ 50 local    │      │ 50 local    │      │ 50 local    │
 │ steps/round │      │ steps/round │      │ steps/round │
 └──────┬──────┘      └──────┬──────┘      └──────┬──────┘
        └─────────────────────┼─────────────────────┘
                              │ upload weight updates
                 ┌────────────▼────────────┐
                 │  Weighted Aggregation   │
                 └─────────────────────────┘

Training configuration

Table
Parameter	Value
Base checkpoint	`openai/whisper-tiny`
Algorithm	FedProx (μ = 0.01)
Clients	4 (speaker non-IID)
FL rounds	3
Local steps per round	50
Local learning rate	1e-5
Local batch size	1
Optimizer	AdamW
Total FL training time	~21 min (CPU)

Usage

Install dependencies:

bash
pip install transformers librosa torch datasets

Run inference (free generation — matches evaluation protocol):

python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa

MODEL_ID = "pr0mila-gh0sh/MediBeng-Whisper-Tiny-FL"

processor = WhisperProcessor.from_pretrained(MODEL_ID)
model = WhisperForConditionalGeneration.from_pretrained(MODEL_ID)

# Free generation (canonical FL evaluation protocol)
model.config.forced_decoder_ids = None
model.generation_config.forced_decoder_ids = None

audio, _ = librosa.load("path_to_audio.wav", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

predicted_ids = model.generate(input_features, max_length=225)
translation = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print("Translation:", translation)

Intended Use

For researchers and developers building privacy-preserving clinical AST systems:

Multi-hospital deployment without centralising patient audio
Federated fine-tuning research on code-switched clinical speech
Bengali–English medical translation in regulated environments

Training Data

Fine-tuned via federated learning on the MediBeng training split, partitioned across 4 simulated clients:

Table
Client	Partition	Samples
0	Male speaker shard A	863
1	Male speaker shard B	863
2	Female speaker shard A	843
3	Female speaker shard B	843

Test evaluation: held-out 960-sample HF test split (identical to v2).

Evaluation Results

Full test set (n = 960, free generation)

Table
Setting	WER ↓	BLEU-4 ↑	chrF++ ↑
Baseline (Whisper Tiny, unfine-tuned)	81.12%	32.20	45.12
Centralised v2	28.20%	73.06	79.28
FL FedAvg IID	6.16%	93.66	94.38
FL FedProx speaker (this model)	3.12%	96.33	96.70

FedProx convergence (quick eval, n = 200 per round)

Table
Round	Quick WER	Quick BLEU	Quick chrF++
1	36.64%	57.94	63.87
2	7.02%	92.25	93.39
3	3.09%	96.35	96.71

Statistical significance (FedProx vs baseline)

Table
Test	Result
Bootstrap 95% CI (WER)	[2.41%, 3.46%]
Paired t-test	p = 6.95 × 10⁻¹⁶
McNemar (≤5% WER)	840/960 FL correct vs 0/960 baseline
Effect size (Cohen's d)	0.27

Limitations

Simulated clients — partitions use synthetic TTS speaker labels, not real multi-hospital data.
3 FL rounds — convergence may improve with more rounds.
Centralised upper bound — comparison vs under-trained centralised v2; fully converged centralised baseline pending.
Full weight transmission — ~156 MB per client per round; LoRA-FL not yet implemented.
No differential privacy — formal ε–δ guarantees not yet added.

Ethical Considerations

Federated learning reduces raw-data exposure but does not eliminate all privacy risks (model inversion, membership inference).
Training data may reflect demographic biases; validate before clinical deployment.
Human review required for all clinical translations.

Blog Post

MediBeng Whisper-Tiny: Translating Code-Switched Bengali-English Speech for Healthcare

Citation

Preprint on medRxiv.

bibtex
@article{ghosh2025medibeng,
  title={MediBeng Whisper Tiny: A fine-tuned code-switched Bengali-English translator for clinical applications},
  author={Ghosh, Promila and Talukder, Sunipun},
  journal={medRxiv},
  year={2025},
  doi={https://doi.org/10.1101/2025.04.25.25326406},
  url={https://www.medrxiv.org/content/10.1101/2025.04.25.25326406v2}
}

MediBeng-Whisper-Tiny-FL

Get help setting up a custom Dedicated Endpoints.

README