TilQazyna/Til-2B-GEC API & Inference Endpoint

What it fixes

Trained on Kazakh error pairs spanning 10 grammar categories:

Септік жалғау (case endings) · Жіктік жалғау (personal/predicative endings)
Тәуелдік жалғау (possessive) · Көптік жалғау (plural)
Шылау (postpositions/particles) · Шақ (tense) · Болымсыздық (negation)
Сөз тәртібі (word order) · Құрмалас сөйлем (compound sentences) · Үндестік заңы (vowel/consonant harmony)
Kazakh-letter confusions (жанбыр→жаңбыр, бул→бұл, окыдым→оқыдым), capitalization, missing punctuation.

Model details


Base	TilQazyna/Til-2B (1977M, dense + MLA)
Format	ChatML, assistant-only loss
Data	~316K Kazakh GEC pairs: human-annotated grammar errors (10 categories) + synthetic typed-error injection over clean sentences
Epochs / LR	3 / 1e-5 (cosine), bf16
Context	4096

The training instruction (fixed, Kazakh):

markdown
Мәтіндегі грамматикалық, орфографиялық және пунктуациялық қателерді түзет. Тек түзетілген мәтінді қайтар.

Usage

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "TilQazyna/Til-2B-GEC"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, dtype=torch.bfloat16, device_map="auto")

INSTR = ("Мәтіндегі грамматикалық, орфографиялық және пунктуациялық қателерді түзет. "
         "Тек түзетілген мәтінді қайтар.")

def correct(text: str) -> str:
    msgs = [{"role": "user", "content": f"{INSTR}\n\nМәтін: {text}"}]
    prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
    ids = tok(prompt, return_tensors="pt", add_special_tokens=False).input_ids.to(model.device)
    out = model.generate(ids, max_new_tokens=128, do_sample=False, pad_token_id=0,
                         eos_token_id=tok.convert_tokens_to_ids("<|im_end|>"))
    text_out = tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True)
    return text_out.strip().split("\n")[0].strip()

print(correct("Алматыда ауа райы жақсы болды кеше жанбыр жауды"))

Intended use & limitations

Intended: correcting single sentences or short paragraphs of Kazakh text.
Best on single sentences; correct long inputs sentence-by-sentence.
May occasionally rephrase rather than minimally correct.
Kazakh only; not a style checker or translator.
No safety alignment has been applied.

License

Apache 2.0. Access is gated (manual approval) for usage tracking.

Til-2B-GEC

Get help setting up a custom Dedicated Endpoints.

README

What it fixes

Model details

Usage

Intended use & limitations

License

Explore FriendliAI today

Til-2B-GEC