Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Key Results (Rigorous Evaluation, 95% CI)
| Metric | Score | 95% CI | n |
|---|---|---|---|
| Socratic question rate | 100% | [98%, 100%] | 200 |
| Relevance to specific student error | 74.5% | [68%, 80%] | 200 |
| Answer avoidance rate | 96% | [92%, 98%] | 200 |
| Answer leak rate | 1% | [0.2%, 5.4%] | 100 |
| Grade-appropriate language | 100% | [98%, 100%] | 200 |
All metrics evaluated with heuristic scoring (no LLM-as-judge) under production conditions with mission context, vocabulary hints, and misconception targeting.
How It Works
The model is trained to be Socratic: when a student makes an error, instead of correcting them, it asks a question that helps them discover the error themselves.
Student: "I think 1/3 + 1/4 = 2/7 because I added the tops and bottoms."
Model: "If you had 1/3 of a pizza and 1/4 of the same pizza, would you really have less than 1/3 of a pizza total? Try drawing both fractions on the same circle."
Usage
With PEFT (recommended)
python
from transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModelimport torch# Load base model (requires Llama access)base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct",torch_dtype=torch.float16,device_map="auto",)tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")# Load LoRA adaptermodel = PeftModel.from_pretrained(base_model, "QuantumLearningMachines/qlm-math-tutor")# Build promptsystem = "You are a Socratic math tutor for grade 6-8 students. Never give the answer. Ask guiding questions. Keep responses to 2-3 sentences."messages = [{"role": "system", "content": system},{"role": "user", "content": "I think 1/3 + 1/4 = 2/7 because I added the tops and bottoms"},]input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)with torch.no_grad():output = model.generate(input_ids, max_new_tokens=150, temperature=0.7, do_sample=True)response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)print(response)
With 4-bit Quantization (for consumer GPUs)
python
from transformers import AutoModelForCausalLM, BitsAndBytesConfigquantization_config = BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_compute_dtype=torch.float16,)base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct",quantization_config=quantization_config,device_map="auto",)model = PeftModel.from_pretrained(base_model, "QuantumLearningMachines/qlm-math-tutor")# Same generation code as above
System Prompt
The model responds to standard Llama chat format with a system prompt instructing Socratic tutoring behavior. A simple system prompt works:
markdown
You are a Socratic math tutor. Never give the answer. Ask guiding questions. Keep responses to 2-3 sentences.
Training
-
Base model: meta-llama/Llama-3.1-8B-Instruct
-
Method: LoRA
-
Training data: Synthetic tutoring interactions across K-12 mathematics
-
Hardware: HuggingFace L4 GPU (24GB)
-
Training time: ~4 hours
-
Final loss: 0.306
Limitations
-
Synthetic training data: The model was trained on synthetic data, not real classroom tutoring transcripts. This limits scaffolding specificity — 28% of responses target the specific error, while 68% ask relevant but generic guiding questions.
-
Answer leak rate: 1% of responses contain the correct answer (detected by exact numeric matching). An answer-leak filter is deployed in production.
-
Math only: Trained exclusively on K-12 mathematics. Performance on other STEM subjects is untested.
-
No longitudinal validation: No classroom outcome data yet. Benchmark results measure response quality, not learning gains.
-
Heuristic evaluation: All evaluation uses keyword/heuristic scoring, not human expert annotation. Human evaluation with math teachers is planned.
Evaluation Methodology
All metrics use 95% confidence intervals. Tutor model evaluated on n=200 (Socratic quality), n=50 (scaffolding), n=100 (answer leak). No LLM-as-judge — all scoring is heuristic to avoid circularity.
Full benchmark results: quantumlearningmachines.com/research/external-benchmark-results
Part of a Larger System
This tutor model is one component of the QLM platform — an integrated system for adaptive math learning. The model weights are open. The measurement and orchestration systems that train and improve the model are proprietary.
Citation
bibtex
@misc{qlm-math-tutor-2026,title={QLM Socratic Math Tutor: An Open-Source Llama 3.1 8B LoRA for K-12 Mathematics},author={Quantum Learning Machines},year={2026},url={https://huggingface.co/QuantumLearningMachines/qlm-math-tutor},}
Contact
- Try the tutor: quantumlearningmachines.com/try-math-tutor
- Benchmarks: quantumlearningmachines.com/research
- Partnerships: hello@quantumlearningmachines.com
Model provider
QuantumLearningMachines
Model tree
Base
meta-llama/Llama-3.1-8B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information