EPFLiGHT

Apertus-70B-MeditronFO

README

License: apache-2.0

Benchmark

Accuracy (%) on standard medical benchmarks. See the paper for full evaluation details, confidence intervals, and open-ended Auto-MOOVE results.

Table with columns: Benchmark, Apertus-70B-Instruct, Apertus-70B-MeditronFO, Δ
Benchmark	Apertus-70B-Instruct	Apertus-70B-MeditronFO	Δ
MedMCQA	52.43	56.32	+3.89
MedQA	60.64	68.58	+7.94
PubMedQA	66.80	75.20	+8.40
MedXpertQA	12.33	16.90	+4.57
HealthBench Hard	32.28	40.14	+7.86
Average	44.90	51.43	+6.53

Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "EPFLiGHT/Apertus-70B-MeditronFO"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "A 62-year-old woman presents with a three-day history of dyspnea on exertion and a productive cough. What is the differential diagnosis?"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Training

Base model: Apertus-70B-Instruct
Corpus: Fully Open Meditron 601k examples (~150M tokens), aggregating eight public medical QA datasets with three clinician-vetted synthetic components: exam-style QA, guideline-grounded QA from 46,469 clinical practice guidelines, and open-ended clinical vignettes
Hardware: 8 NVIDIA GH200 nodes
Framework: Axolotl with FSDP v2 / DeepSpeed ZeRO-3, Flash Attention 2, bf16 mixed precision
Decontamination: System-wide two-stage n-gram and token-alignment decontamination against all evaluation benchmarks

Full hyperparameters are in Appendix I of the paper.

Compute & footprint

The training was done on 8 nodes of 4 NVIDIA GH200 GPUs for approximately 6 hours on the CSCS Swiss National Supercomputing Centre. Our trainings have a carbon neutral footprint as the CSCS data center is carbon neutral (CSCS energy efficiency).

Limitations & intended use

MeditronFO can produce text on a variety of topics, but the generated content may not always be factually accurate, logically consistent, or free from biases present in the training data. MeditronFO has been trained to be specialised for Medicine and is intended to be used for Medicine related tasks evaluation. These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content.

Citation

If you find MeditronFO useful in your research, please cite our preprint:

bibtex
@misc{theimerlienhard2026fullyopenmeditronauditable,
  title         = {Fully Open Meditron: An Auditable Pipeline for Clinical LLMs},
  author        = {Xavier Theimer-Lienhard and Mushtaha El-Amin and Fay Elhassan and Sahaj Vaidya and Victor Cartier-Negadi and David Sasu and Lars Klein and Mary-Anne Hartley},
  year          = {2026},
  eprint        = {2605.16215},
  archivePrefix = {arXiv},
  primaryClass  = {cs.AI},
  url           = {https://arxiv.org/abs/2605.16215}
}

Acknowledgments and Disclosure of Funding

This work was supported under project ID #27 as part of the Swiss AI Initiative, through a grant from the ETH Domain and computational resources provided by the Swiss National Supercomputing Centre (CSCS) under the Alps infrastructure. We thank the physician review panel within the LiGHT laboratory for their clinical auditing, methodological review, and validation of the synthetic generation and evaluation pipelines. We additionally thank the many physicians and clinical experts who contributed to the MOOVE initiative through expert review, pairwise evaluation, benchmarking, and clinical vignette development across diverse international settings.

Contact

Please use the community tab for any discussions or issue related to this model. Questions related to the project can be sent to xavier.theimer-lienhard@epfl.ch or mary-anne.hartley@epfl.ch.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

EPFLiGHT

Model Tree

Base

swiss-ai/Apertus-70B-Instruct-2509

Fine-tuned

this model

Input Modalities

Text

Output Modalities