Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model details
| Base | TilQazyna/Til-2B (1977M, dense + MLA) |
| Format | ChatML (`< |
| Loss | assistant tokens only |
| Data | ~345K instruction–response pairs (≈70% Kazakh; also ru/en/code/math) |
| Epochs / LR | 3 / 1e-5 (cosine), bf16 |
| Context | 4096 |
| Hardware | 8×H200, DDP |
The instruction mix covers QA, summarization, rephrasing, classification, NER, keyword extraction, simplification, title generation and general chat, with Russian/English/code/math instructions for cross-lingual robustness.
Usage
python
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizerrepo = "TilQazyna/Til-2B-instruct"tok = AutoTokenizer.from_pretrained(repo)model = AutoModelForCausalLM.from_pretrained(repo, dtype=torch.bfloat16, device_map="auto")msgs = [{"role": "user", "content": "Денсаулықты сақтау үшін 3 кеңес бер."}]text = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)ids = tok(text, return_tensors="pt", add_special_tokens=False).input_ids.to(model.device)out = model.generate(ids, max_new_tokens=300, do_sample=True, temperature=0.7,top_p=0.9, repetition_penalty=1.1, pad_token_id=0,eos_token_id=tok.convert_tokens_to_ids("<|im_end|>"))print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))
Sample outputs
Q: Абай Құнанбайұлы кім болған? A: Абай Құнанбайұлы — қазақ халқының ұлы ақыны, ойшыл және қоғам қайраткері. Ол қазіргі Қазақстан аумағында дүниеге келген, өмірінің көп бөлігін Семей қаласында өткізген… (өлеңдері мен қара сөздері туралы жалғасады).
Q: Денсаулықты сақтау үшін 3 кеңес бер. A: 1. Көбірек су ішіңіз… 2. Дұрыс тамақтаныңыз… 3. Темекіден бас тартыңыз…
Intended use & limitations
- Intended: Kazakh-first assistant for QA, writing, summarization, rewriting.
- Reasoning/math: arithmetic and multi-step reasoning are weak (a known limit at this scale); the model may switch to Russian on math prompts.
- Factuality: can hallucinate; verify facts and numbers.
- No safety alignment / RLHF has been applied.
License
Apache 2.0. Access is gated (manual approval) for usage tracking.
Model provider
TilQazyna
Model tree
Base
TilQazyna/Til-2B
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information