KETI-AIR

KETI-AIR

keti-llama-7b-v0.1

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: other

Model Details

  • Architecture: LlamaForCausalLM
  • Parameters: 8B-class
  • Context length in config: 131,072 tokens
  • Hidden size: 4096
  • Layers: 32
  • Attention heads: 32
  • KV heads: 8
  • Vocabulary size: 128,256
  • Recommended dtype: bfloat16

Evaluation

Evaluation timestamp: 20260604_202553

Table
CategoryDatasetVersionMetricModeScore
Corecore_average-naive_averagegen27.77
Instruction FollowingIFEval353ae7Prompt-level-strict-accuracygen50.65
Math Calculationaime2024bc6078accuracygen16.67
Math Calculationaime20255e9f4faccuracygen3.33
Math Calculationmath_prm800k_50011c4b5accuracygen60.20
General Reasoningbbh-naive_averagegen11.87
General ReasoningGPQA_diamond5aeeceaccuracygen20.71
Knowledgemmlu_pro-naive_averagegen28.26
Codeopenai_humanevaldcae0ehumaneval_pass@1gen60.98
Codelcb_code_generationb5b6c5pass@1gen6.00
Long Context Reasoningleval-naive_averagegen39.37
Long Context Reasoninglongbench-naive_averagegen20.57
Long Context ReasoningLongBenchv275fbbaaccuracygen24.85
Long Context Reasoningketi_long_ctx_gutenberg-naive_averagegen17.62

Quick Start

python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "KETI-AIR/keti-llama-7b-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [
{"role": "user", "content": "Explain why long-context reasoning is useful."}
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_p=0.9,
)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Intended Use

This model is intended for research and development on instruction following, code generation, mathematical reasoning, and long-context generation tasks.

Limitations

The model can generate incorrect, unsafe, or biased content. Users should evaluate the model for their own deployment setting and apply appropriate safety filters and human review where needed.

Training Framework

  • Transformers: 5.8.1
  • PyTorch: 2.11.0+cu130
  • Datasets: 4.8.5
  • Tokenizers: 0.22.2
  • TRL: 1.4.0

Citation

If you use this model, please cite the corresponding KETI-AIR release and the training/evaluation resources used in your work.

Model provider

KETI-AIR

KETI-AIR

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today