KETI-AIR
keti-llama-7b-v0.1
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherModel Details
- Architecture:
LlamaForCausalLM - Parameters: 8B-class
- Context length in config: 131,072 tokens
- Hidden size: 4096
- Layers: 32
- Attention heads: 32
- KV heads: 8
- Vocabulary size: 128,256
- Recommended dtype:
bfloat16
Evaluation
Evaluation timestamp: 20260604_202553
| Category | Dataset | Version | Metric | Mode | Score |
|---|---|---|---|---|---|
| Core | core_average | - | naive_average | gen | 27.77 |
| Instruction Following | IFEval | 353ae7 | Prompt-level-strict-accuracy | gen | 50.65 |
| Math Calculation | aime2024 | bc6078 | accuracy | gen | 16.67 |
| Math Calculation | aime2025 | 5e9f4f | accuracy | gen | 3.33 |
| Math Calculation | math_prm800k_500 | 11c4b5 | accuracy | gen | 60.20 |
| General Reasoning | bbh | - | naive_average | gen | 11.87 |
| General Reasoning | GPQA_diamond | 5aeece | accuracy | gen | 20.71 |
| Knowledge | mmlu_pro | - | naive_average | gen | 28.26 |
| Code | openai_humaneval | dcae0e | humaneval_pass@1 | gen | 60.98 |
| Code | lcb_code_generation | b5b6c5 | pass@1 | gen | 6.00 |
| Long Context Reasoning | leval | - | naive_average | gen | 39.37 |
| Long Context Reasoning | longbench | - | naive_average | gen | 20.57 |
| Long Context Reasoning | LongBenchv2 | 75fbba | accuracy | gen | 24.85 |
| Long Context Reasoning | keti_long_ctx_gutenberg | - | naive_average | gen | 17.62 |
Quick Start
python
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizermodel_id = "KETI-AIR/keti-llama-7b-v0.1"tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(model_id,torch_dtype=torch.bfloat16,device_map="auto",trust_remote_code=True,)messages = [{"role": "user", "content": "Explain why long-context reasoning is useful."}]inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt",).to(model.device)outputs = model.generate(inputs,max_new_tokens=512,do_sample=True,temperature=0.7,top_p=0.9,)print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
Intended Use
This model is intended for research and development on instruction following, code generation, mathematical reasoning, and long-context generation tasks.
Limitations
The model can generate incorrect, unsafe, or biased content. Users should evaluate the model for their own deployment setting and apply appropriate safety filters and human review where needed.
Training Framework
- Transformers: 5.8.1
- PyTorch: 2.11.0+cu130
- Datasets: 4.8.5
- Tokenizers: 0.22.2
- TRL: 1.4.0
Citation
If you use this model, please cite the corresponding KETI-AIR release and the training/evaluation resources used in your work.
Model provider
KETI-AIR
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information