Primeinvincible

mistral-medqa-lora-v3

Deploy Dedicated

Intended Use

This adapter is intended for:

research on medical QA reliability
selective prediction and abstention experiments
evaluation of confidence-based and learned abstention methods
educational and portfolio demonstration

It is not intended for clinical decision-making.

Out-of-Scope Use

Do not use this model for:

real patient diagnosis
treatment recommendations
clinical triage
emergency medical decisions
replacing medical professionals

Training Data

Dataset:

GBaker/MedQA-USMLE-4-options

Task format:

Medical question
Four answer options: A, B, C, D
Model predicts the correct option

The dataset contains USMLE-style multiple-choice questions and does not represent open-ended clinical consultation.

Method

Base model:

mistralai/Mistral-7B-v0.3

Fine-tuning method:

QLoRA
4-bit quantization
LoRA rank: 16
LoRA alpha: 32
Target modules: q_proj, v_proj

Evaluation Summary

On the MedQA-USMLE test split:

Table with columns: Model, Accuracy, Coverage, Dataset Wrong Rate
Model	Accuracy	Coverage	Dataset Wrong Rate
Base Mistral-7B	49.33%	100%	50.67%
SFT adapter	52.24%	100%	47.76%

The SFT adapter was also used as the starting point for post-hoc confidence thresholding and later warm-start + DPO learned-abstention experiments.

Note: this Hugging Face repository contains the SFT QLoRA adapter. DPO learned-abstention checkpoints are reported in the GitHub repository results, but are not hosted here.

Selective Prediction Context

The larger project studies two abstention approaches:

Post-hoc confidence thresholding using A/B/C/D answer probabilities.
Learned abstention using warm-start SFT + DPO, where the model learns an explicit abstention completion:

text
I cannot answer confidently.

Final learned-abstention checkpoints reduced dataset-level wrong answers from about 48% to 7-17%, depending on the selected coverage/safety operating point.

Limitations

Evaluated only on multiple-choice USMLE-style questions.
Not evaluated for open-ended clinical advice.
Not clinically validated.
May produce incorrect medical answers.
Should not be used for real medical decisions.
Performance depends on prompt format and evaluation method.

Ethical and Safety Notes

Medical QA models can produce plausible but incorrect answers. This adapter is provided for research and educational purposes only.

Any clinical use would require:

domain expert validation
extensive safety testing
calibration on held-out data
uncertainty estimation
regulatory and institutional review

Citation / Attribution

If using this adapter or project, please cite or link the GitHub repository:

https://github.com/Tharun2908/mistral-medqa-abstention

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

Primeinvincible

Model Tree

Base

mistralai/Mistral-7B-v0.3

Adapter

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer

Intended Use

This adapter is intended for:

research on medical QA reliability
selective prediction and abstention experiments
evaluation of confidence-based and learned abstention methods
educational and portfolio demonstration

It is not intended for clinical decision-making.

Out-of-Scope Use

Do not use this model for:

real patient diagnosis
treatment recommendations
clinical triage
emergency medical decisions
replacing medical professionals

Training Data

Dataset:

GBaker/MedQA-USMLE-4-options

Task format:

Medical question
Four answer options: A, B, C, D
Model predicts the correct option

The dataset contains USMLE-style multiple-choice questions and does not represent open-ended clinical consultation.

Method

Base model:

mistralai/Mistral-7B-v0.3

Fine-tuning method:

QLoRA
4-bit quantization
LoRA rank: 16
LoRA alpha: 32
Target modules: q_proj, v_proj

Evaluation Summary

On the MedQA-USMLE test split:

Table with columns: Model, Accuracy, Coverage, Dataset Wrong Rate
Model	Accuracy	Coverage	Dataset Wrong Rate
Base Mistral-7B	49.33%	100%	50.67%
SFT adapter	52.24%	100%	47.76%

The SFT adapter was also used as the starting point for post-hoc confidence thresholding and later warm-start + DPO learned-abstention experiments.

Note: this Hugging Face repository contains the SFT QLoRA adapter. DPO learned-abstention checkpoints are reported in the GitHub repository results, but are not hosted here.

Selective Prediction Context

The larger project studies two abstention approaches:

Post-hoc confidence thresholding using A/B/C/D answer probabilities.
Learned abstention using warm-start SFT + DPO, where the model learns an explicit abstention completion:

text
I cannot answer confidently.

Final learned-abstention checkpoints reduced dataset-level wrong answers from about 48% to 7-17%, depending on the selected coverage/safety operating point.

Limitations

Evaluated only on multiple-choice USMLE-style questions.
Not evaluated for open-ended clinical advice.
Not clinically validated.
May produce incorrect medical answers.
Should not be used for real medical decisions.
Performance depends on prompt format and evaluation method.

Ethical and Safety Notes

Medical QA models can produce plausible but incorrect answers. This adapter is provided for research and educational purposes only.

Any clinical use would require:

domain expert validation
extensive safety testing
calibration on held-out data
uncertainty estimation
regulatory and institutional review

Citation / Attribution

If using this adapter or project, please cite or link the GitHub repository:

https://github.com/Tharun2908/mistral-medqa-abstention

mistral-medqa-lora-v3

README

Intended Use

Out-of-Scope Use

Training Data

Method

Evaluation Summary

Selective Prediction Context

Limitations

Ethical and Safety Notes

Citation / Attribution

Explore FriendliAI today

README

Intended Use

Out-of-Scope Use

Training Data

Method

Evaluation Summary

Selective Prediction Context

Limitations

Ethical and Safety Notes

Citation / Attribution