Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: mit

Intended Use

This adapter is intended for:

  • research on medical QA reliability
  • selective prediction and abstention experiments
  • evaluation of confidence-based and learned abstention methods
  • educational and portfolio demonstration

It is not intended for clinical decision-making.


Out-of-Scope Use

Do not use this model for:

  • real patient diagnosis
  • treatment recommendations
  • clinical triage
  • emergency medical decisions
  • replacing medical professionals

Training Data

Dataset:

GBaker/MedQA-USMLE-4-options

Task format:

  • Medical question
  • Four answer options: A, B, C, D
  • Model predicts the correct option

The dataset contains USMLE-style multiple-choice questions and does not represent open-ended clinical consultation.


Method

Base model:

mistralai/Mistral-7B-v0.3

Fine-tuning method:

  • QLoRA
  • 4-bit quantization
  • LoRA rank: 16
  • LoRA alpha: 32
  • Target modules: q_proj, v_proj

Evaluation Summary

On the MedQA-USMLE test split:

ModelAccuracyCoverageDataset Wrong Rate
Base Mistral-7B49.33%100%50.67%
SFT adapter52.24%100%47.76%

The SFT adapter was also used as the starting point for post-hoc confidence thresholding and later warm-start + DPO learned-abstention experiments.

Note: this Hugging Face repository contains the SFT QLoRA adapter. DPO learned-abstention checkpoints are reported in the GitHub repository results, but are not hosted here.


Selective Prediction Context

The larger project studies two abstention approaches:

  1. Post-hoc confidence thresholding using A/B/C/D answer probabilities.
  2. Learned abstention using warm-start SFT + DPO, where the model learns an explicit abstention completion:

text

I cannot answer confidently.

Final learned-abstention checkpoints reduced dataset-level wrong answers from about 48% to 7-17%, depending on the selected coverage/safety operating point.


Limitations

  • Evaluated only on multiple-choice USMLE-style questions.
  • Not evaluated for open-ended clinical advice.
  • Not clinically validated.
  • May produce incorrect medical answers.
  • Should not be used for real medical decisions.
  • Performance depends on prompt format and evaluation method.

Ethical and Safety Notes

Medical QA models can produce plausible but incorrect answers. This adapter is provided for research and educational purposes only.

Any clinical use would require:

  • domain expert validation
  • extensive safety testing
  • calibration on held-out data
  • uncertainty estimation
  • regulatory and institutional review

Citation / Attribution

If using this adapter or project, please cite or link the GitHub repository:

https://github.com/Tharun2908/mistral-medqa-abstention

Model provider

Primeinvincible

Model tree

Base

mistralai/Mistral-7B-v0.3

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today