Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Container
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Description
- Developed by: didiudom94
- Model type: Whisper (Sequence-to-Sequence Audio Transformer)
- Language(s) (ISOs): Korean (
ko) to English (en) - License: Apache 2.0
- Finetuned from model:
openai/whisper-small
Training Hyperparameters & Infrastructure
- Hardware: NVIDIA A100 GPU
- Quantization: 4-bit NormalFloat (
nf4) with double quantization - LoRA Configurations: Rank (r) = 32, Alpha (α) = 64
- Learning Rate: 1e-4
- Precision: Mixed Precision (
bf16) - Optimizer updates: Direct weights optimization (Batch Size = 32, Gradient Accumulation = 1)
How to Load and Use
You can easily reload this model for inference using the code snippet below:
python
import torchfrom transformers import WhisperForConditionalGeneration, WhisperProcessorfrom peft import PeftModel, PeftConfigmodel_id = "openai/whisper-small"peft_model_id = "didiudom94/whisper-small-ko-to-en-translator"# Load unified processorprocessor = WhisperProcessor.from_pretrained(peft_model_id)# Load base architecture in low-precisionbase_model = WhisperForConditionalGeneration.from_pretrained(model_id,load_in_4bit=True,device_map="auto")# Merge fine-tuned LoRA weightsmodel = PeftModel.from_pretrained(base_model, peft_model_id)
Model provider
didiudom94
Model tree
Base
openai/whisper-small
Adapter
this model
Modalities
Input
Audio
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information