Model Details
- Model type: Whisper sequence-to-sequence ASR model
- Base model:
openai/whisper-small
- Release group: Core Whisper Small merges
- Checkpoint kind: Headwise Selective Attention (HSA) merged checkpoint
- Manuscript role: Core headwise selective attention merge
- Source artifact:
02_core_merges_small/hsa_merge
Method Context
This is a structured model merge for compositional domain adaptation. It composes task-specific adaptations without additional retraining by restricting parameter arithmetic to salient attention heads where the adaptations are concentrated. The folder name records K=60% for the core and scaling-law HSA releases.
Training/adaptation context: Compositional OGI child-speech setting: spontaneous 3-5 and scripted 0-2 adaptations.
The broader manuscript studies whether speech foundation model adaptations
for different distribution shifts, such as acoustic condition, speaking style,
speaker population, and dialect, can be recombined for low-resource and
intersectional ASR without direct joint-supervision data.
Intended Use
Use this checkpoint to reproduce or extend the paper's ASR model-merging
experiments. It is intended for research on child ASR, compositional domain
adaptation, robustness, cross-corpus transfer, dialectal variation, and scaling
behavior across Whisper model sizes.
How To Load
from transformers import WhisperForConditionalGeneration, WhisperProcessor
model_id = "balaji1312/whisper_small_hsa_k60_spon_3_5_script_0_2"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)
For local use before upload:
from pathlib import Path
from transformers import WhisperForConditionalGeneration, WhisperProcessor
model_dir = Path("final_release_models") / "02_core_merges_small" / "whisper_small_hsa_k60_spon_3_5_script_0_2"
processor = WhisperProcessor.from_pretrained(model_dir)
model = WhisperForConditionalGeneration.from_pretrained(model_dir)
Release Files
This model card was generated for the curated release tree. The model-loading
payload consists of:
config.json, generation_config.json, preprocessor_config.json, tokenizer_config.json, vocab.json, merges.txt, normalizer.json, special_tokens_map.json, added_tokens.json, model.safetensors
Training state, optimizer state, decode logs, hypotheses, references, and
intermediate experiment outputs were intentionally omitted.
Limitations
The checkpoint is released for research reproducibility. Results outside the
paper's child ASR, robustness, cross-corpus, dialectal, and scaling-law
settings are not characterized here. Reproducing WER numbers requires the
manuscript evaluation pipeline and authorized access to the relevant speech
corpora; no evaluation audio or transcripts are redistributed in this model
folder.
Citation
If you use this checkpoint, please cite the manuscript:
@article{shankara2026compositional,
title = {Compositional Domain Adaptation for Automatic Speech Recognition with Headwise Selective Attention Merging},
author = {Shankara, Natarajan Balaji and Wang, Zilai and Eren, Eray and Alwan, Abeer},
year = {2026},
note = {Manuscript submitted to Computer Speech & Language}
}