Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Performance
Accuracy (%) on standard medical benchmarks. See the paper for full evaluation details, confidence intervals, and open-ended Auto-MOOVE results.
| Benchmark | Apertus-70B-Instruct | Apertus-70B-MeditronFO | Δ |
|---|---|---|---|
| MedMCQA | 52.43 | 56.32 | +3.89 |
| MedQA | 60.64 | 68.58 | +7.94 |
| PubMedQA | 66.80 | 75.20 | +8.40 |
| MedXpertQA | 12.33 | 16.90 | +4.57 |
| HealthBench Hard | 32.28 | 40.14 | +7.86 |
| Average | 44.90 | 51.43 | +6.53 |
Usage
python
from transformers import AutoModelForCausalLM, AutoTokenizerimport torchmodel_id = "EPFLiGHT/Apertus-70B-MeditronFO"tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id,torch_dtype=torch.bfloat16,device_map="auto",)messages = [{"role": "user", "content": "A 62-year-old woman presents with a three-day history of dyspnea on exertion and a productive cough. What is the differential diagnosis?"},]inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,tokenize=True,return_dict=True,return_tensors="pt",).to(model.device)outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
Training
- Base model: Apertus-70B-Instruct
- Corpus: Fully Open Meditron — ~601k examples (~150M tokens), aggregating eight public medical QA datasets with three clinician-vetted synthetic components: exam-style QA, guideline-grounded QA from 46,469 clinical practice guidelines, and open-ended clinical vignettes
- Hardware: NVIDIA GH200 nodes
- Framework: Axolotl with FSDP v2 / DeepSpeed ZeRO-3, Flash Attention 2, bf16 mixed precision
- Decontamination: System-wide two-stage n-gram and token-alignment decontamination against all evaluation benchmarks
Full hyperparameters are in Appendix I of the paper.
Intended Use
Research only. This model is intended to support research on medical LLMs, auditing of clinical AI systems, and reproducibility of the Fully Open Meditron pipeline.
It is not validated for clinical deployment, individual patient advice, autonomous decision-making, or any other deployment-adjacent use. Conduct independent domain-specific safety evaluation before any such use.
Citation
If you use this model, please cite:
bibtex
@misc{theimerlienhard2026fullyopenmeditronauditable,title = {Fully Open Meditron: An Auditable Pipeline for Clinical LLMs},author = {Xavier Theimer-Lienhard and Mushtaha El-Amin and Fay Elhassan and Sahaj Vaidya and Victor Cartier-Negadi and David Sasu and Lars Klein and Mary-Anne Hartley},year = {2026},eprint = {2605.16215},archivePrefix = {arXiv},primaryClass = {cs.AI},url = {https://arxiv.org/abs/2605.16215}}
License
Released under the apache-2.0 license. Permissive use including commercial, subject to attribution.
Model provider
EPFLiGHT
Model tree
Base
swiss-ai/Apertus-70B-Instruct-2509
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information