Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Details

AuthorFaizan Iqbal (@Faizaniqbal)
Base modelFaizaniqbal/KoshurAI_Tarjuma_v2
Adapter typeLoRA (QLoRA training)
ArchitectureGemma3ForCausalLM + PEFT LoRA
LanguagesKashmiri (ks · kas_Arab), English (en)
LicenseApache-2.0
Training data16,637 curated bidirectional EN↔KS sentence pairs
Training computeGoogle Colab GPU

Model Tree

markdown

google/gemma-3-4b-pt
└─ google/gemma-3-4b-it
└─ sarvamai/sarvam-translate
└─ Faizaniqbal/KoshurAI_Tarjuma_v2 ← 2.8M Kashmiri pretraining
└─ Faizaniqbal/KoshurAI_Tarjuma_v3 ← this adapter (SFT)

Quickstart

Install

bash

pip install transformers peft accelerate bitsandbytes sentencepiece

Load & Translate

python

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
BASE_MODEL = "Faizaniqbal/KoshurAI_Tarjuma_v2"
ADAPTER = "Faizaniqbal/KoshurAI_Tarjuma_v3"
bnb_cfg = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
tok = AutoTokenizer.from_pretrained(BASE_MODEL)
tok.pad_token = tok.eos_token
base = AutoModelForCausalLM.from_pretrained(
BASE_MODEL, quantization_config=bnb_cfg, device_map="auto"
)
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()
def translate(text, direction="en2ks"):
prefix = "Translate to Kashmiri: " if direction == "en2ks" else "Translate to English: "
prompt = f"<start_of_turn>user\n{prefix}{text}<end_of_turn>\n<start_of_turn>model\n"
inputs = tok(prompt, return_tensors="pt", truncation=True, max_length=512).to("cuda")
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=150,
min_new_tokens=5,
do_sample=False,
repetition_penalty=1.1,
)
return tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()
print(translate("The dog is sleeping.", "en2ks"))
print(translate("ہونٛد چھُ شُنٛگِتھ", "ks2en"))

Training

Stage 1 — Kashmiri Pretraining (base model)

The base model (KoshurAI_Tarjuma_v2) was continually pretrained on 2.8 million tokens of Kashmiri text from publicly available sources (literature, journalism, academic texts, religious scholarship). This gave the model deep Kashmiri language knowledge.

Stage 2 — SFT for Translation (this adapter)

This LoRA adapter was trained on 16,637 curated bidirectional sentence pairs (EN↔KS + KS↔EN) to teach the model explicit translation capability.

SplitRecords
Base SFT corpus (v2)15,527
New pairs (v3)1,110
Total16,637

Training Configuration

HyperparameterValue
Base modelFaizaniqbal/KoshurAI_Tarjuma_v2
LoRA rank (r)16
LoRA alpha16
LoRA dropout0.05
LoRA target modulesq_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization4-bit NF4 (BitsAndBytes)
Compute dtypebfloat16
Epochs2
Learning rate1e-4
Effective batch size8 (2 × grad_accum 4)
Max sequence length512 tokens
Optimizerpaged_adamw_8bit
LR schedulerCosine
Warmup steps100
Weight decay0.01

Evaluation — FLORES-200 Devtest (1,012 sentences)

DirectionModelBLEUCOMET
KS→ENKoshurAI v3 (ours)15.740.6982
KS→ENNLLB-200 distilled-600M16.280.6741
EN→KSKoshurAI v3 (ours)30.37¹0.6604
EN→KSNLLB-200 distilled-600M39.65¹0.6431

¹ EN→KS BLEU is character-level (tokenize='char'), standard for Arabic-script output. COMET = Unbabel/wmt22-comet-da system score.

KoshurAI v3 outperforms NLLB-200 on COMET in both directions.

Sample Translations (EN→KS)

EnglishKoshurAI v3
They include the Netherlands, with Anna Jochemsen finishing ninth.تِیَم چھُ نیدرلینڈس شامِل کَران اَینا جوکیمسن فِنِشِنگ نائنتھ سیتھ
Hershey and Chase used phages, or viruses, to implant their own DNA.ۂرشے تہٕ چیسن کٔرۍ فیگ تہٕ جَراثیم منٛز پنُن ڈی این اے اَزناوُنہِ خٲطر
They usually have special food, drink and entertainment offers.تِیَمَن چھُ اکثر خاص کھٮ۪ن، چیٖز تہٕ تفریح پیش کَرنہِ یِوان

Inference Settings

ParameterValue
do_sampleFalse (greedy)
max_new_tokens150 (EN→KS) / 200 (KS→EN)
min_new_tokens5
repetition_penalty1.1

Hardware Requirements

SettingVRAM
4-bit inference (recommended)~6–8 GB
Colab free tier (T4)✅ with 4-bit
Colab L4 / A100✅ comfortable

Limitations

  • Trained on sentence-level pairs (≤ 512 tokens); long-form translation unsupported.
  • Performance on technical, legal, or dialectal Kashmiri is unverified.
  • No human evaluation conducted; COMET and BLEU are automatic metrics only.
  • 4-bit quantization used for inference; full-precision may yield higher scores.

Citation

If you use this model, please cite:

bibtex

@misc{iqbal2026koshurai,
title = {KoshurAI v3: A Fine-Tuned Neural Machine Translation System
for Kashmiri--English Bidirectional Translation},
author = {Iqbal, Faizan},
year = {2026},
howpublished = {\url{https://huggingface.co/Faizaniqbal/KoshurAI_Tarjuma_v3}},
note = {LoRA adapter fine-tuned from Faizaniqbal/KoshurAI_Tarjuma_v2}
}

This work fine-tunes the model by Malik & Nissar — also cite:

bibtex

@misc{malik2026koshurkouter,
title = {Koshur Kouter KS-EN v1: A Merged QLoRA Kashmiri--English Translation Model},
author = {Malik, Haq Nawaz and Nissar, Nahfid},
year = {2026},
howpublished = {\url{https://huggingface.co/Omarrran/koshur-kouter-ks-en_v1}},
note = {Fine-tuned from sarvamai/sarvam-translate}
}

And the original base model:

bibtex

@misc{sarvam2025translate,
title = {Sarvam-Translate},
author = {{Sarvam AI}},
howpublished = {\url{https://huggingface.co/sarvamai/sarvam-translate}}
}

Acknowledgements

This model builds on Omarrran/koshur-kouter-ks-en_v1, which was fine-tuned by Haq Nawaz Malik & Nahfid Nissar (2026), itself built on sarvamai/sarvam-translate (Gemma 3, 4.5B) by Sarvam AI. Evaluated on FLORES-200 devtest. COMET scored using Unbabel/wmt22-comet-da.

Model provider

Faizaniqbal

Model tree

Base

Faizaniqbal/KoshurAI_Tarjuma_v2

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today