veyatia

whisper-creole-medical-v4

README

License: apache-2.0

Model summary

Table with columns: Field, Value
Field	Value
Architecture	OpenAI Whisper large-v3-turbo + PEFT LoRA (merged full weights)
Base	`jsbeaudry/oswald-large-v3-turbo-m1`
Training	LoRA `r=32`, `alpha=64`, 4 epochs, grouped script-level val split
Final validation WER	9.21% (epoch 4)
Training region	Modal `us-central` (US), A100-80GB
Output language	Haitian Creole (`ht`) medical phrasing

Intended use

Inbound/outbound clinical phone interpretation (triage, pharmacy, vitals, anatomy).
Telephony-degraded audio (8 kHz band-limited synthetic + real 16 kHz phone captures).
Research and internal QA on Haitian Creole medical ASR.

Not for: diagnosis, autonomous clinical decision-making, or use without human review of transcripts in care settings.

Limitations

Trained on scripted / semi-scripted phrases, not full spontaneous clinical dialogue.
Mixed French loanwords and orthography variants remain challenging.
10 in-car driving clips (Oswald auto-labels) add noise diversity but are not script-verified.
No guarantee of HIPAA “safe harbor” de-identification in runtime transcripts — operators must follow org policies.
CT2 deployment (*-ct2 repos) requires separate conversion; this repo is HuggingFace transformers weights.

Training data (1,592 clips)

Unified corpus: ht_training_combined (Veyatia internal assembly, May 2026).

Table with columns: Source, Clips, Sample rate, Description
Source	Clips	Sample rate	Description
Telephony TTS	793	8 kHz mono	PSTN-simulated medical phrases (300–3400 Hz bandpass, line noise)
Synthetic TTS	739	16 kHz mono	ElevenLabs clinical combinator corpus (filler + real-dup rows excluded)
Real phone	50	16 kHz mono	Native speaker, scripted medical phrases (iPhone capture)
Driving record	10	16 kHz mono

Excluded from synthetic manifest: 104 filler templates, 9 duplicates of real-phone text, 148 missing WAV (not in training run).

PHI: Training text is synthetic or scripted; no patient-identifiable fields were intentionally collected. Driving clips are operator-recorded medical phrase practice, not live patient encounters.

Evaluation methodology

Metric: word error rate (WER) via jiwer on validation set.
Split: ~10% of script groups held out (not random clips), so the same sentence does not appear in train and val.
Eval: predict_with_generate=True each epoch; best checkpoint by lowest eval_wer.
Hardware: NVIDIA A100-80GB; mixed precision (fp16 autocast, float32 weights).

WER by epoch (validation)

Table with columns: Epoch, eval WER
Epoch	eval WER
1	14.89%
2	10.45%
4	9.21%

(Epoch 3 checkpoint not retained as best; epoch 4 selected.)

Training procedure

Base model: jsbeaudry/oswald-large-v3-turbo-m1 (HF transformers, not CT2).
LoRA on attention projections (q_proj, v_proj, k_proj, out_proj).
Optimizer: AdamW, LR 8e-5, warmup 5%, batch 4 × grad accum 4 (effective 16).
Audio: librosa load at 16 kHz; 8 kHz telephony rows upsampled at train time.
Merge: LoRA adapter merged into base weights → this repository.
Infra: Modal serverless GPU; corpus on private volume veyatia-training-data.

HIPAA & deployment notes (operators)

Training: No intentional PHI in corpus; US-region compute (us-central).
Inference: Transcripts may contain user-spoken PHI at runtime — treat outputs as sensitive; use encryption, access controls, audit logging, and BAAs as required.
Retention: Do not use this model card or public HF artifacts to store patient audio or transcripts.
BAA: HuggingFace Hub hosting is separate from your clinical BAA stack — confirm compliance with your security team.

How to load

python
from transformers import WhisperForConditionalGeneration, WhisperProcessor

model_id = "veyatia/whisper-creole-medical-v4"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

Forced language: Haitian Creole (ht). For production telephony, prefer CT2 export for CPU inference (see Veyatia RunPod docs).

Version history

Table with columns: Version, Base, Notes
Version	Base	Notes
v1	openai/whisper-large-v3-turbo	Early Creole medical experiments
v2	Internal Creole HF checkpoint	Pilot WER ~0.273 on clip set
v3	`veyatia/whisper-creole-medical-v3` (+ `-ct2`)	Prior production CT2 path
v4	`jsbeaudry/oswald-large-v3-turbo-m1` + LoRA	This release — 1,592-clip corpus, 9.21% val WER

Citation / contact

Developed by Veyatia Health for Haitian Creole medical interpretation infrastructure.
For issues: open a discussion on this repo or contact your Veyatia operator.

License

Apache 2.0 (aligned with base Oswald / Whisper ecosystem). Verify third-party voice data licenses for any new fine-tuning you perform on top of this checkpoint.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

veyatia

Model Tree

Base

jsbeaudry/oswald-large-v3-turbo-m1

Fine-tuned

this model

Input Modalities

Audio

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer