Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Why Fine-Tune a Model?
Large language models like Mistral 7B are trained on broad, internet-scale data. That makes them capable generalists — but generalists have limitations when applied to specialized domains.
Fine-tuning is the process of continuing the training of a pre-trained model on a smaller, domain-specific dataset. Instead of learning from scratch (which requires massive compute and data), we teach an already-capable model to speak the language of a specific field — in this case, breast cancer medicine.
The goal is not to make the model "smarter" in a general sense, but more:
- Accurate — using the right clinical terminology and referencing real medical concepts
- Consistent — answering breast cancer questions in the structured, informative style of medical Q&A
- Relevant — focusing its generation on domain knowledge rather than generic preambles
Why QLoRA?
Training all 7 billion parameters requires dozens of GB of VRAM and days of compute — out of reach without expensive hardware. QLoRA (Quantized Low-Rank Adaptation) makes it accessible with two techniques:
- 4-bit quantization — model weights are compressed from 32-bit floats to 4-bit integers, reducing VRAM from ~28 GB to ~5 GB with minimal quality loss.
- LoRA adapters — instead of updating all 7B parameters, small trainable matrices (adapters) are injected into the attention layers. Only ~1–2% of total parameters are trained. The rest stay frozen.
The result: a meaningful domain fine-tune on a single T4 GPU (Google Colab free tier) in approximately 35 minutes.
Model Details
| Property | Value |
|---|---|
| Base model | mistralai/Mistral-7B-Instruct-v0.2 |
| Fine-tuning method | QLoRA (4-bit quantization + LoRA) |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj |
| Training epochs | 3 |
| Effective batch size | 8 (2 per device × 4 gradient accumulation steps) |
| Learning rate | 2e-4 |
| LR scheduler | Cosine |
| Max sequence length | 512 tokens |
| Quantization type | NF4 (NormalFloat4) with double quantization |
| Training hardware | NVIDIA T4 (Google Colab free tier) |
| Training time | ~35 minutes |
Training Data
Trained on DiegoDomLarr/breast-cancer-qa — 1,061 breast cancer Q&A pairs from two sources:
| Source | Examples | Description |
|---|---|---|
PubMedQA (pqa_labeled) | 29 | Human-verified biomedical questions from PubMed abstracts |
| ChatDoctor-HealthCareMagic-100k | 1,032 | Real patient–doctor consultations filtered for breast cancer |
| Total | 1,061 |
Filter keywords: breast cancer, breast carcinoma, BRCA1, BRCA2, HER2, tamoxifen, mastectomy, lumpectomy, mammogram, ductal carcinoma, lobular carcinoma, triple negative breast, aromatase inhibitor, trastuzumab
Dataset statistics:
- Average question length: 541 characters
- Average answer length: 621 characters
Training format — all examples were converted to Mistral's instruction template:
markdown
<s>[INST] {question} [/INST] {answer} </s>
Evaluation
The fine-tuned model was compared against the base Mistral-7B-Instruct on the prompt:
"What are the main risk factors for breast cancer?"
Fine-tuned model response:
Several factors can increase the risk of developing breast cancer. Here are some of the most common risk factors:
- Gender: Being female is the greatest risk factor for breast cancer.
- Age: The risk of breast cancer increases as women get older. Most breast cancers are diagnosed in women over the age of 50.
- Genetic Factors: Certain genetic mutations, such as those in the BRCA1 and BRCA2 genes, can significantly increase the risk. Women with a family history of breast cancer in first-degree relatives are also at higher risk.
- Lifestyle Factors: A lack of physical activity, a diet high in saturated fat, being overweight or obese, and smoking all contribute to increased risk.
Base Mistral response:
Breast cancer is the most common cancer among women worldwide. Several risk factors can increase a woman's chance of developing breast cancer.
- Age: The risk increases as women get older. Most breast cancers are diagnosed after age 50.
- Genetic factors: A family history of breast cancer increases the risk. Inherited mutations in BRCA1 and BRCA2 significantly increase risk.
- Hormonal factors: Extended exposure to estrogen and progesterone can increase risk. Factors include early menstruation, late menopause, and never having given birth.
ROUGE-L score (base vs fine-tuned): 0.4509
A ROUGE-L of ~0.45 confirms the fine-tuned model generates answers that are meaningfully different from the base — more structured, more patient-oriented, and covering additional factors (gender, lifestyle) not prominently addressed by the base model.
How to Use
Requirements
bash
pip install transformers peft bitsandbytes accelerate
Inference
python
import torchfrom transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfigfrom peft import PeftModelbase_model_id = "mistralai/Mistral-7B-Instruct-v0.2"bnb_config = BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_quant_type="nf4",bnb_4bit_compute_dtype=torch.float16,bnb_4bit_use_double_quant=True,)tokenizer = AutoTokenizer.from_pretrained(base_model_id)tokenizer.pad_token = tokenizer.eos_tokenbase_model = AutoModelForCausalLM.from_pretrained(base_model_id,quantization_config=bnb_config,device_map="auto",)model = PeftModel.from_pretrained(base_model, "DiegoDomLarr/mistral-7b-breast-cancer-qlora")model.eval()model.config.use_cache = Trueprompt = "<s>[INST] What are the side effects of tamoxifen? [/INST]"inputs = tokenizer(prompt, return_tensors="pt").to("cuda")with torch.no_grad():outputs = model.generate(**inputs,max_new_tokens=300,temperature=0.7,do_sample=True,)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Limitations and Intended Use
This model is for educational and research purposes only. It is not a medical device and must not be used for clinical diagnosis, treatment decisions, or patient care.
- Responses may contain inaccuracies or outdated medical information — always verify with a licensed healthcare professional.
- The model was trained on ~1,000 examples, which is small by fine-tuning standards. It may hallucinate or generalize poorly on edge-case questions.
- Coverage is limited to breast cancer topics represented in the training data.
- This model has not been audited, validated, or certified for any medical use.
Tech Stack
| Library | Role |
|---|---|
transformers | Load Mistral-7B and tokenizer |
peft | Apply and load LoRA adapters |
bitsandbytes | 4-bit quantization |
trl | SFTTrainer for supervised fine-tuning |
datasets | Load and process training data |
evaluate | ROUGE-L scoring |
huggingface_hub | Push adapter and dataset to HF Hub |
About This Project
This model was built as an end-to-end learning project covering the full fine-tuning pipeline:
- Curating and publishing a domain-specific dataset to HF Hub
- Loading a 7B model in 4-bit on consumer hardware (Colab T4)
- Applying LoRA adapters with PEFT
- Training with SFTTrainer (TRL library)
- Evaluating with ROUGE-L against the base model
- Publishing the adapter and model card to HuggingFace Hub
Author: DiegoDomLarr Dataset: DiegoDomLarr/breast-cancer-qa Base model: mistralai/Mistral-7B-Instruct-v0.2
License
Apache 2.0 — same as the base model.
Model provider
DiegoDomLarr
Model tree
Base
mistralai/Mistral-7B-Instruct-v0.2
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information