Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Results (held-out test split)
| Metric | Value |
|---|---|
| Eval samples | 100 |
| Accuracy | 39% |
| No-answer rate | 0% |
| Random chance (4-option) | 25% |
| Random chance (mixed 4/5-option) | ~22% |
| Target (planned full run) | ≥50% |
Confidence interval at n=100: ±5pp (95% CI). True accuracy is plausibly in [34%, 44%].
What worked:
- Format adherence is solid — 0% no-answer rate. The model reliably starts responses with "The correct answer is X)".
- Clear lift above chance even with ~15% of planned training compute.
What didn't:
- Accuracy below the +5pp-vs-base target. Primary cause: undertraining.
- Test set includes some 5-option (A–E) samples the training filter missed (regex caught
E)but notE:); the model only saw 4-option questions in training, so 5-option samples are harder.
Training details
| Item | Value |
|---|---|
| Base model | mistralai/Mistral-7B-Instruct-v0.2 |
| Dataset | medalpaca/medical_meadow_medqa (filtered) |
| Train samples | 9,158 (after 5-option filter + 90/10 split) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Quantization | 4-bit NF4 (bitsandbytes) |
| Optimizer | paged_adamw_32bit |
| Learning rate | 2e-4 (cosine, 3% warmup) |
| Max seq length | 1024 |
| Effective batch size | 16 (4 × 4 grad accum) |
| Planned steps | 1144 (2 epochs) |
| Completed steps | ~171 (~0.3 epochs) |
| Hardware | Kaggle 2× T4 (model-parallel) |
| Final training loss | ~0.78 (from initial 1.67) |
Usage
python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfigfrom peft import PeftModelimport torchBASE = "mistralai/Mistral-7B-Instruct-v0.2"ADAPTER = "anksriv/mistral-7b-medical-medqa-qlora" # this repobnb = BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_compute_dtype=torch.float16,bnb_4bit_quant_type="nf4",bnb_4bit_use_double_quant=True,)model = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb, device_map="auto")model = PeftModel.from_pretrained(model, ADAPTER)tokenizer = AutoTokenizer.from_pretrained(BASE)system = ("You are a knowledgeable medical AI assistant. ""When given a clinical multiple-choice question, analyze the case carefully, ""identify the correct answer (A, B, C, or D), and provide a clear explanation. ""Always begin your response with 'The correct answer is X)' where X is the letter.")messages = [{"role": "system", "content": system},{"role": "user", "content": "A 32-year-old woman presents with ... A) ... B) ... C) ... D) ..."},]inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)out = model.generate(inputs, max_new_tokens=200, temperature=0.1, do_sample=True)print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))
Intended use & limitations
This is a research/educational artifact, not a clinical tool. Do not use it for medical decision-making. The 39% accuracy on USMLE-style questions, combined with partial training, makes this strictly inappropriate for any patient-facing or diagnostic application.
The adapter inherits all biases of:
- The base Mistral-7B-Instruct-v0.2 model
- The MedQA dataset (USMLE-skewed, US-centric, English-only)
Reproducibility
- Code: https://github.com/ankitsriv89/03-llm-finetuning
- Training notebook:
notebooks/02_medical.ipynb - Single-GPU RunPod script:
scripts/train_runpod_medical.py - Eval script:
scripts/evaluate.py --mode mcq - Seed: 42 (train/test split)
Planned full-run improvements
- Run the full 2 epochs (~1144 steps) on a single A40/L40S/4090 (~30-45 min target).
- Tighten the 5-option filter (catch both
E)andE:). - Evaluate on the full 1K held-out set instead of 100.
- Compare against base Mistral-7B-Instruct-v0.2 (no adapter) on the same eval set to compute the actual delta.
Model provider
anksriv
Model tree
Base
mistralai/Mistral-7B-Instruct-v0.2
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information