Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Performance

Accuracy (%) on standard medical benchmarks. See the paper for full evaluation details, confidence intervals, and open-ended Auto-MOOVE results.

BenchmarkApertus-70B-InstructApertus-70B-MeditronFOΔ
MedMCQA52.4356.32+3.89
MedQA60.6468.58+7.94
PubMedQA66.8075.20+8.40
MedXpertQA12.3316.90+4.57
HealthBench Hard32.2840.14+7.86
Average44.9051.43+6.53

Usage

python

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "EPFLiGHT/Apertus-70B-MeditronFO"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "user", "content": "A 62-year-old woman presents with a three-day history of dyspnea on exertion and a productive cough. What is the differential diagnosis?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Training

  • Base model: Apertus-70B-Instruct
  • Corpus: Fully Open Meditron — ~601k examples (~150M tokens), aggregating eight public medical QA datasets with three clinician-vetted synthetic components: exam-style QA, guideline-grounded QA from 46,469 clinical practice guidelines, and open-ended clinical vignettes
  • Hardware: NVIDIA GH200 nodes
  • Framework: Axolotl with FSDP v2 / DeepSpeed ZeRO-3, Flash Attention 2, bf16 mixed precision
  • Decontamination: System-wide two-stage n-gram and token-alignment decontamination against all evaluation benchmarks

Full hyperparameters are in Appendix I of the paper.

Intended Use

Research only. This model is intended to support research on medical LLMs, auditing of clinical AI systems, and reproducibility of the Fully Open Meditron pipeline.

It is not validated for clinical deployment, individual patient advice, autonomous decision-making, or any other deployment-adjacent use. Conduct independent domain-specific safety evaluation before any such use.

Citation

If you use this model, please cite:

bibtex

@misc{theimerlienhard2026fullyopenmeditronauditable,
title = {Fully Open Meditron: An Auditable Pipeline for Clinical LLMs},
author = {Xavier Theimer-Lienhard and Mushtaha El-Amin and Fay Elhassan and Sahaj Vaidya and Victor Cartier-Negadi and David Sasu and Lars Klein and Mary-Anne Hartley},
year = {2026},
eprint = {2605.16215},
archivePrefix = {arXiv},
primaryClass = {cs.AI},
url = {https://arxiv.org/abs/2605.16215}
}

License

Released under the apache-2.0 license. Permissive use including commercial, subject to attribution.

Model provider

EPFLiGHT

Model tree

Base

swiss-ai/Apertus-70B-Instruct-2509

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today