Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Results (held-out test split)

MetricValue
Eval samples100
Accuracy39%
No-answer rate0%
Random chance (4-option)25%
Random chance (mixed 4/5-option)~22%
Target (planned full run)≥50%

Confidence interval at n=100: ±5pp (95% CI). True accuracy is plausibly in [34%, 44%].

What worked:

  • Format adherence is solid — 0% no-answer rate. The model reliably starts responses with "The correct answer is X)".
  • Clear lift above chance even with ~15% of planned training compute.

What didn't:

  • Accuracy below the +5pp-vs-base target. Primary cause: undertraining.
  • Test set includes some 5-option (A–E) samples the training filter missed (regex caught E) but not E:); the model only saw 4-option questions in training, so 5-option samples are harder.

Training details

ItemValue
Base modelmistralai/Mistral-7B-Instruct-v0.2
Datasetmedalpaca/medical_meadow_medqa (filtered)
Train samples9,158 (after 5-option filter + 90/10 split)
LoRA rank16
LoRA alpha32
LoRA dropout0.05
Target modulesq_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization4-bit NF4 (bitsandbytes)
Optimizerpaged_adamw_32bit
Learning rate2e-4 (cosine, 3% warmup)
Max seq length1024
Effective batch size16 (4 × 4 grad accum)
Planned steps1144 (2 epochs)
Completed steps~171 (~0.3 epochs)
HardwareKaggle 2× T4 (model-parallel)
Final training loss~0.78 (from initial 1.67)

Usage

python

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch
BASE = "mistralai/Mistral-7B-Instruct-v0.2"
ADAPTER = "anksriv/mistral-7b-medical-medqa-qlora" # this repo
bnb = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb, device_map="auto")
model = PeftModel.from_pretrained(model, ADAPTER)
tokenizer = AutoTokenizer.from_pretrained(BASE)
system = ("You are a knowledgeable medical AI assistant. "
"When given a clinical multiple-choice question, analyze the case carefully, "
"identify the correct answer (A, B, C, or D), and provide a clear explanation. "
"Always begin your response with 'The correct answer is X)' where X is the letter.")
messages = [
{"role": "system", "content": system},
{"role": "user", "content": "A 32-year-old woman presents with ... A) ... B) ... C) ... D) ..."},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=200, temperature=0.1, do_sample=True)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))

Intended use & limitations

This is a research/educational artifact, not a clinical tool. Do not use it for medical decision-making. The 39% accuracy on USMLE-style questions, combined with partial training, makes this strictly inappropriate for any patient-facing or diagnostic application.

The adapter inherits all biases of:

  • The base Mistral-7B-Instruct-v0.2 model
  • The MedQA dataset (USMLE-skewed, US-centric, English-only)

Reproducibility

Planned full-run improvements

  1. Run the full 2 epochs (~1144 steps) on a single A40/L40S/4090 (~30-45 min target).
  2. Tighten the 5-option filter (catch both E) and E:).
  3. Evaluate on the full 1K held-out set instead of 100.
  4. Compare against base Mistral-7B-Instruct-v0.2 (no adapter) on the same eval set to compute the actual delta.

Model provider

anksriv

Model tree

Base

mistralai/Mistral-7B-Instruct-v0.2

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today