khayyam-math

khayyam-math-qwen2.5-7b-v6

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

What it is

A 161 MB PEFT adapter that turns Qwen2.5-7B-Instruct into a voice-narrated math figure generator. Single inference produces {problem_statement, solution, math_claims, svg, narration, title} — the SVG and narration are ready for the Khayyam Math viewer to render with phrase-timed audio highlighting.

The adapter is trained against the same chat-format messages that the Khayyam Math production runtime uses today, so it slots into the existing chain (CP-SAT layout planner, vision audit, math verifier chain) without code changes.

How to load

Python (transformers + peft)

python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct", torch_dtype="bfloat16", device_map="auto"
)
tok = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
model = PeftModel.from_pretrained(base, "khayyam-math/khayyam-math-qwen2.5-7b-v6")
model.eval()

Via the `khayyam-math` Python package

bash
pip install "khayyam-math[qwen] @ git+https://github.com/khayyam-math/khayyam-math"

python
from khayyam_math import KhayyamMath

client = KhayyamMath(provider="qwen",
                     model="khayyam-math/khayyam-math-qwen2.5-7b-v6")
result = client.generate("Solve x^2 - 5x + 6 = 0")
print(result.svg[:200])
print(result.narration[:3])

Via vLLM (production)

bash
vllm serve Qwen/Qwen2.5-7B-Instruct \
  --enable-lora \
  --lora-modules khayyam-v6=khayyam-math/khayyam-math-qwen2.5-7b-v6 \
  --max-lora-rank 16 --dtype bfloat16

Then point the Khayyam Math client at it:

python
client = KhayyamMath(provider="qwen-vllm",
                     base_url="http://localhost:8000/v1",
                     model="khayyam-v6")

Training summary

Table with columns: v4, v5.1, v6
	v4	v5.1	v6
Corpus	teacher_v6_mini (3,395 ex)	teacher_v7 (2,402 ex)	teacher_v7 (2,402 ex)
Production telemetry	❌	✅ (52 turns)	✅ (52 turns)
Rank	16	8	16
Alpha	32	16

Practical-test battery (20-prompt held-out)

Table with columns: v4, v5.1, v6
	v4	v5.1	v6
Valid figures	18 / 20	20 / 20	pending
Min ship threshold	16 / 20	16 / 20	16 / 20

Eval will run via scripts/eval_lora_variant.py once a held-out slice is freshly screenshot-captured. Until then v6 ships as a candidate, not as the default adapter — available_loras.json keeps the production default at the prior promoted model.

Architecture & data lineage

For the design that this adapter feeds into — the ten-route express pipeline, the FDL primitives, the five-tier math-correctness chain, the structural critic, REFINEMENT MODE — see the Khayyam Math ARCHITECTURE.md.

For the data lineage of this checkpoint:

Synthetic teacher (2,350 examples): gpt-4o-mini solving the PROMPTS_V5 pool, filtered through the inspector + the SymPy verifier chain. See docs/finetune.md.
Production telemetry (52 turns under ToS §5 anonymisation): sft_clean.jsonl (39 turns the structural critic + math verifier accepted) + sft_corrected.jsonl (13 turns a human reviewer corrected). Hash-anonymised; no user identifiers in the corpus.

License

MIT (same as the Khayyam Math source). The Qwen 2.5-7B-Instruct base model carries the Tongyi Qianwen License Agreement which you must comply with when using this adapter — only the LoRA delta in this repo is MIT.

Citation

bibtex
@software{khayyam_math_qwen_v6,
  title  = {Khayyam Math (Qwen 2.5-7B + v6 LoRA)},
  author = {Kermani Kolankeh, Arash},
  year   = {2026},
  url    = {https://github.com/khayyam-math/khayyam-math},
  note   = {LoRA adapter on Qwen/Qwen2.5-7B-Instruct, MIT licence}
}

Model provider

khayyam-math

Model tree

Base

Qwen/Qwen2.5-7B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

What it is

How to load

Python (transformers + peft)

python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct", torch_dtype="bfloat16", device_map="auto"
)
tok = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
model = PeftModel.from_pretrained(base, "khayyam-math/khayyam-math-qwen2.5-7b-v6")
model.eval()

Via the `khayyam-math` Python package

bash
pip install "khayyam-math[qwen] @ git+https://github.com/khayyam-math/khayyam-math"

python
from khayyam_math import KhayyamMath

client = KhayyamMath(provider="qwen",
                     model="khayyam-math/khayyam-math-qwen2.5-7b-v6")
result = client.generate("Solve x^2 - 5x + 6 = 0")
print(result.svg[:200])
print(result.narration[:3])

Via vLLM (production)

bash
vllm serve Qwen/Qwen2.5-7B-Instruct \
  --enable-lora \
  --lora-modules khayyam-v6=khayyam-math/khayyam-math-qwen2.5-7b-v6 \
  --max-lora-rank 16 --dtype bfloat16

Then point the Khayyam Math client at it:

python
client = KhayyamMath(provider="qwen-vllm",
                     base_url="http://localhost:8000/v1",
                     model="khayyam-v6")

Training summary

Table with columns: v4, v5.1, v6
	v4	v5.1	v6
Corpus	teacher_v6_mini (3,395 ex)	teacher_v7 (2,402 ex)	teacher_v7 (2,402 ex)
Production telemetry	❌	✅ (52 turns)	✅ (52 turns)
Rank	16	8	16
Alpha	32	16

Practical-test battery (20-prompt held-out)

Table with columns: v4, v5.1, v6
	v4	v5.1	v6
Valid figures	18 / 20	20 / 20	pending
Min ship threshold	16 / 20	16 / 20	16 / 20

Architecture & data lineage

For the data lineage of this checkpoint:

Synthetic teacher (2,350 examples): gpt-4o-mini solving the PROMPTS_V5 pool, filtered through the inspector + the SymPy verifier chain. See docs/finetune.md.
Production telemetry (52 turns under ToS §5 anonymisation): sft_clean.jsonl (39 turns the structural critic + math verifier accepted) + sft_corrected.jsonl (13 turns a human reviewer corrected). Hash-anonymised; no user identifiers in the corpus.

License

Citation

bibtex
@software{khayyam_math_qwen_v6,
  title  = {Khayyam Math (Qwen 2.5-7B + v6 LoRA)},
  author = {Kermani Kolankeh, Arash},
  year   = {2026},
  url    = {https://github.com/khayyam-math/khayyam-math},
  note   = {LoRA adapter on Qwen/Qwen2.5-7B-Instruct, MIT licence}
}

khayyam-math-qwen2.5-7b-v6

Get help setting up a custom Dedicated Endpoints.

README

What it is

How to load

Python (transformers + peft)

Via the `khayyam-math` Python package

Via vLLM (production)

Training summary

Practical-test battery (20-prompt held-out)

Architecture & data lineage

License

Citation

Explore FriendliAI today

README

What it is

How to load

Python (transformers + peft)

Via the `khayyam-math` Python package

Via vLLM (production)

Training summary

Practical-test battery (20-prompt held-out)

Architecture & data lineage

License

Citation

khayyam-math-qwen2.5-7b-v6

Get help setting up a custom Dedicated Endpoints.

What it is

How to load

Python (transformers + peft)

Via the khayyam-math Python package

Via vLLM (production)

Training summary

Practical-test battery (20-prompt held-out)

Architecture & data lineage

License

Citation

Explore FriendliAI today

What it is

How to load

Python (transformers + peft)

Via the khayyam-math Python package

Via vLLM (production)

Training summary

Practical-test battery (20-prompt held-out)

Architecture & data lineage

License

Citation

Via the `khayyam-math` Python package

Via the `khayyam-math` Python package