Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Details
| Author | Faizan Iqbal (@Faizaniqbal) |
| Base model | Faizaniqbal/KoshurAI_Tarjuma_v2 |
| Adapter type | LoRA (QLoRA training) |
| Architecture | Gemma3ForCausalLM + PEFT LoRA |
| Languages | Kashmiri (ks · kas_Arab), English (en) |
| License | Apache-2.0 |
| Training data | 16,637 curated bidirectional EN↔KS sentence pairs |
| Training compute | Google Colab GPU |
Model Tree
markdown
google/gemma-3-4b-pt└─ google/gemma-3-4b-it└─ sarvamai/sarvam-translate└─ Faizaniqbal/KoshurAI_Tarjuma_v2 ← 2.8M Kashmiri pretraining└─ Faizaniqbal/KoshurAI_Tarjuma_v3 ← this adapter (SFT)
Quickstart
Install
bash
pip install transformers peft accelerate bitsandbytes sentencepiece
Load & Translate
python
import torchfrom transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfigfrom peft import PeftModelBASE_MODEL = "Faizaniqbal/KoshurAI_Tarjuma_v2"ADAPTER = "Faizaniqbal/KoshurAI_Tarjuma_v3"bnb_cfg = BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_use_double_quant=True,bnb_4bit_quant_type="nf4",bnb_4bit_compute_dtype=torch.bfloat16,)tok = AutoTokenizer.from_pretrained(BASE_MODEL)tok.pad_token = tok.eos_tokenbase = AutoModelForCausalLM.from_pretrained(BASE_MODEL, quantization_config=bnb_cfg, device_map="auto")model = PeftModel.from_pretrained(base, ADAPTER)model.eval()def translate(text, direction="en2ks"):prefix = "Translate to Kashmiri: " if direction == "en2ks" else "Translate to English: "prompt = f"<start_of_turn>user\n{prefix}{text}<end_of_turn>\n<start_of_turn>model\n"inputs = tok(prompt, return_tensors="pt", truncation=True, max_length=512).to("cuda")with torch.no_grad():out = model.generate(**inputs,max_new_tokens=150,min_new_tokens=5,do_sample=False,repetition_penalty=1.1,)return tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()print(translate("The dog is sleeping.", "en2ks"))print(translate("ہونٛد چھُ شُنٛگِتھ", "ks2en"))
Training
Stage 1 — Kashmiri Pretraining (base model)
The base model (KoshurAI_Tarjuma_v2) was continually pretrained on
2.8 million tokens of Kashmiri text from publicly available sources
(literature, journalism, academic texts, religious scholarship). This gave
the model deep Kashmiri language knowledge.
Stage 2 — SFT for Translation (this adapter)
This LoRA adapter was trained on 16,637 curated bidirectional sentence pairs (EN↔KS + KS↔EN) to teach the model explicit translation capability.
| Split | Records |
|---|---|
| Base SFT corpus (v2) | 15,527 |
| New pairs (v3) | 1,110 |
| Total | 16,637 |
Training Configuration
| Hyperparameter | Value |
|---|---|
| Base model | Faizaniqbal/KoshurAI_Tarjuma_v2 |
| LoRA rank (r) | 16 |
| LoRA alpha | 16 |
| LoRA dropout | 0.05 |
| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Quantization | 4-bit NF4 (BitsAndBytes) |
| Compute dtype | bfloat16 |
| Epochs | 2 |
| Learning rate | 1e-4 |
| Effective batch size | 8 (2 × grad_accum 4) |
| Max sequence length | 512 tokens |
| Optimizer | paged_adamw_8bit |
| LR scheduler | Cosine |
| Warmup steps | 100 |
| Weight decay | 0.01 |
Evaluation — FLORES-200 Devtest (1,012 sentences)
| Direction | Model | BLEU | COMET |
|---|---|---|---|
| KS→EN | KoshurAI v3 (ours) | 15.74 | 0.6982 ✅ |
| KS→EN | NLLB-200 distilled-600M | 16.28 | 0.6741 |
| EN→KS | KoshurAI v3 (ours) | 30.37¹ | 0.6604 ✅ |
| EN→KS | NLLB-200 distilled-600M | 39.65¹ | 0.6431 |
¹ EN→KS BLEU is character-level (tokenize='char'), standard for Arabic-script output.
COMET = Unbabel/wmt22-comet-da system score.
KoshurAI v3 outperforms NLLB-200 on COMET in both directions.
Sample Translations (EN→KS)
| English | KoshurAI v3 |
|---|---|
| They include the Netherlands, with Anna Jochemsen finishing ninth. | تِیَم چھُ نیدرلینڈس شامِل کَران اَینا جوکیمسن فِنِشِنگ نائنتھ سیتھ |
| Hershey and Chase used phages, or viruses, to implant their own DNA. | ۂرشے تہٕ چیسن کٔرۍ فیگ تہٕ جَراثیم منٛز پنُن ڈی این اے اَزناوُنہِ خٲطر |
| They usually have special food, drink and entertainment offers. | تِیَمَن چھُ اکثر خاص کھٮ۪ن، چیٖز تہٕ تفریح پیش کَرنہِ یِوان |
Inference Settings
| Parameter | Value |
|---|---|
do_sample | False (greedy) |
max_new_tokens | 150 (EN→KS) / 200 (KS→EN) |
min_new_tokens | 5 |
repetition_penalty | 1.1 |
Hardware Requirements
| Setting | VRAM |
|---|---|
| 4-bit inference (recommended) | ~6–8 GB |
| Colab free tier (T4) | ✅ with 4-bit |
| Colab L4 / A100 | ✅ comfortable |
Limitations
- Trained on sentence-level pairs (≤ 512 tokens); long-form translation unsupported.
- Performance on technical, legal, or dialectal Kashmiri is unverified.
- No human evaluation conducted; COMET and BLEU are automatic metrics only.
- 4-bit quantization used for inference; full-precision may yield higher scores.
Citation
If you use this model, please cite:
bibtex
@misc{iqbal2026koshurai,title = {KoshurAI v3: A Fine-Tuned Neural Machine Translation Systemfor Kashmiri--English Bidirectional Translation},author = {Iqbal, Faizan},year = {2026},howpublished = {\url{https://huggingface.co/Faizaniqbal/KoshurAI_Tarjuma_v3}},note = {LoRA adapter fine-tuned from Faizaniqbal/KoshurAI_Tarjuma_v2}}
This work fine-tunes the model by Malik & Nissar — also cite:
bibtex
@misc{malik2026koshurkouter,title = {Koshur Kouter KS-EN v1: A Merged QLoRA Kashmiri--English Translation Model},author = {Malik, Haq Nawaz and Nissar, Nahfid},year = {2026},howpublished = {\url{https://huggingface.co/Omarrran/koshur-kouter-ks-en_v1}},note = {Fine-tuned from sarvamai/sarvam-translate}}
And the original base model:
bibtex
@misc{sarvam2025translate,title = {Sarvam-Translate},author = {{Sarvam AI}},howpublished = {\url{https://huggingface.co/sarvamai/sarvam-translate}}}
Acknowledgements
This model builds on Omarrran/koshur-kouter-ks-en_v1, which was fine-tuned
by Haq Nawaz Malik & Nahfid Nissar (2026),
itself built on sarvamai/sarvam-translate (Gemma 3, 4.5B) by Sarvam AI.
Evaluated on FLORES-200 devtest. COMET scored using Unbabel/wmt22-comet-da.
Model provider
Faizaniqbal
Model tree
Base
Faizaniqbal/KoshurAI_Tarjuma_v2
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information