TilQazyna/Til-mini-1B-GEC API & Inference Endpoint

What it fixes

Kazakh-specific letter confusions: жанбыр → жаңбыр, кызыкты → қызық(ты), окыдым → оқыдым, бул → бұл
Capitalization at sentence start
Missing end-of-sentence punctuation
Common agreement/spelling slips in informal text

Model details


Base	TilQazyna/Til-mini-1B (956.3M, dense + MLA)
Fine-tune format	ChatML (`<
Loss	assistant tokens only
Data	Kazakh GEC pairs: organic social-media sentences + synthetic errors (~900 pairs)
Epochs / LR	5 / 1e-5 (cosine), bf16
Context	2048

The training instruction (fixed, Kazakh):

markdown
Мәтіндегі грамматикалық, орфографиялық және пунктуациялық қателерді түзет. Тек түзетілген мәтінді қайтар.

Usage

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "TilQazyna/Til-mini-1B-GEC"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, dtype=torch.bfloat16, device_map="auto")

INSTR = ("Мәтіндегі грамматикалық, орфографиялық және пунктуациялық қателерді түзет. "
         "Тек түзетілген мәтінді қайтар.")

def correct(text: str) -> str:
    msgs = [{"role": "user", "content": f"{INSTR}\n\nМәтін: {text}"}]
    prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
    ids = tok(prompt, return_tensors="pt", add_special_tokens=False).input_ids.to(model.device)
    out = model.generate(ids, max_new_tokens=128, do_sample=False, pad_token_id=0,
                         eos_token_id=tok.convert_tokens_to_ids("<|im_end|>"))
    text_out = tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True)
    # The model may continue after the corrected sentence — keep the first line/sentence.
    return text_out.strip().split("\n")[0].strip()

print(correct("Алматыда ауа райы жақсы болды кеше жанбыр жауды"))
# → Алматыда ауа райы жақсы болды кеше жаңбыр жауды.

Real examples (greedy decoding):

Input	Output (first sentence)
`мен мектепке барамын деп айтты ол кеше`	`Мен мектепке барамын деп айтты ол кеше.`
`Алматыда ауа райы жақсы болды кеше жанбыр жауды`	`Алматыда ауа райы жақсы болды кеше жаңбыр жауды.`
`бул кітап өте кызыкты екен мен оны окыдым`	`Кітап өте қызық екен мен оны оқыдым.`

Intended use & limitations

Intended: correcting single sentences or short paragraphs of informal Kazakh text.
Stopping: the fine-tune corpus is small (~900 pairs), so the model does not always emit <|im_end|> reliably and may repeat the corrected sentence — always post-process to the first line/sentence as in the snippet above. Long multi-sentence inputs are best corrected sentence-by-sentence.
The model may occasionally rephrase rather than minimally correct.
Not a style checker and not a translator; Kazakh only.
No safety alignment has been applied.

License

Apache 2.0. Access is gated (manual approval) for usage tracking.

Til-mini-1B-GEC

Get help setting up a custom Dedicated Endpoints.

README

What it fixes

Model details

Usage

Intended use & limitations

License

Explore FriendliAI today

Til-mini-1B-GEC