AK04-IXR

sarvam1-hinglish-g2p-lora

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: mit

Results

On a held-out split it reproduces the espeak-ng reference with 0.00% PER / 100% exact phoneme match (n=60) — i.e. it generalizes the phonemizer's deterministic mapping to unseen sentences.

Usage

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("sarvamai/sarvam-1")
m = AutoModelForCausalLM.from_pretrained("sarvamai/sarvam-1")
m = PeftModel.from_pretrained(m, "AK04-IXR/sarvam1-hinglish-g2p-lora")

prompt = "Input: Mera flight ticket pee-en-aar eight three nine two hai.\nOutput:"
ids = tok(prompt, return_tensors="pt").to(m.device)
out = m.generate(**ids, max_new_tokens=160, do_sample=False)
print(tok.decode(out[0][ids['input_ids'].shape[1]:], skip_special_tokens=True))

Training

LoRA (r=16, α=32; 0.94% of params) on ~7k (text → IPA) pairs phonemized by espeak-ng (en-us), 3 epochs, bf16, single A100.

Limitations

Distilled from espeak-ng, so it matches (does not surpass) that reference; trained on Latin-script normalized text (Devanagari-carrier lines held out), and code-switched phonemization (per-span language ID) remains an open problem.

Model provider

AK04-IXR

Model tree

Base

sarvamai/sarvam-1

Adapter

this model

Modalities

Input

Text

Output

Text