Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

What it fixes

Trained on Kazakh error pairs spanning 10 grammar categories:

  • Септік жалғау (case endings) · Жіктік жалғау (personal/predicative endings)
  • Тәуелдік жалғау (possessive) · Көптік жалғау (plural)
  • Шылау (postpositions/particles) · Шақ (tense) · Болымсыздық (negation)
  • Сөз тәртібі (word order) · Құрмалас сөйлем (compound sentences) · Үндестік заңы (vowel/consonant harmony)
  • Kazakh-letter confusions (жанбыр→жаңбыр, бул→бұл, окыдым→оқыдым), capitalization, missing punctuation.

Model details

BaseTilQazyna/Til-2B (1977M, dense + MLA)
FormatChatML, assistant-only loss
Data~316K Kazakh GEC pairs: human-annotated grammar errors (10 categories) + synthetic typed-error injection over clean sentences
Epochs / LR3 / 1e-5 (cosine), bf16
Context4096

The training instruction (fixed, Kazakh):

markdown

Мәтіндегі грамматикалық, орфографиялық және пунктуациялық қателерді түзет. Тек түзетілген мәтінді қайтар.

Usage

python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "TilQazyna/Til-2B-GEC"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, dtype=torch.bfloat16, device_map="auto")
INSTR = ("Мәтіндегі грамматикалық, орфографиялық және пунктуациялық қателерді түзет. "
"Тек түзетілген мәтінді қайтар.")
def correct(text: str) -> str:
msgs = [{"role": "user", "content": f"{INSTR}\n\nМәтін: {text}"}]
prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
ids = tok(prompt, return_tensors="pt", add_special_tokens=False).input_ids.to(model.device)
out = model.generate(ids, max_new_tokens=128, do_sample=False, pad_token_id=0,
eos_token_id=tok.convert_tokens_to_ids("<|im_end|>"))
text_out = tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True)
return text_out.strip().split("\n")[0].strip()
print(correct("Алматыда ауа райы жақсы болды кеше жанбыр жауды"))

Intended use & limitations

  • Intended: correcting single sentences or short paragraphs of Kazakh text.
  • Best on single sentences; correct long inputs sentence-by-sentence.
  • May occasionally rephrase rather than minimally correct.
  • Kazakh only; not a style checker or translator.
  • No safety alignment has been applied.

License

Apache 2.0. Access is gated (manual approval) for usage tracking.

Model provider

TilQazyna

TilQazyna

Model tree

Base

TilQazyna/Til-2B

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today