naazimsnh02

Shifa-4B-SFT-LoRA

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Benchmark Results (Stage-1 SFT)

Table
Benchmark	Shifa-4B	MedGemma-4B-IT	Delta
MedQA (USMLE, 4-opt)	0.702	0.691	+0.011
MedMCQA	0.508	0.598	-0.090
PubMedQA	0.374	0.682	-0.308

MedGemma targets are from the google/medgemma-4b-it model card. Evaluated on 500 samples per benchmark using greedy decoding.

Training Details

Data

~395K examples blended from three medical reasoning datasets plus a general-instruction mix to prevent catastrophic forgetting:

Table
Source	Samples	Description
FreedomIntelligence/medical-o1-reasoning-SFT	~20K	HuatuoGPT-o1 gold chain-of-thought traces
OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B	200K	High-effort reasoning from GPT-OSS-120B
Intelligent-Internet/II-Medical-Reasoning-SFT	150K	Sampled for scale and diversity
HuggingFaceH4/ultrachat_200k	~28K (7%)	General instruction mix

Hyperparameters

Table
Parameter	Value
Method	QLoRA (4-bit NF4)
LoRA rank (r)	32
LoRA alpha	64
Target modules	All attention + MLP (language layers only)
Vision tower	Frozen
Batch size	4 × 4 gradient accumulation (effective 16)
Learning rate	1e-4 (cosine schedule)
Max sequence length	4096
Training steps	10,000
Optimizer	AdamW 8-bit
Warmup	3% of steps
Training time	~21.7 hours on 1× A100-SXM4-80GB
Loss masking	Response-only (assistant turns)

Framework

Unsloth 2026.6.7
TRL 0.22.2 (SFTTrainer)
Transformers 5.12.1
PyTorch 2.8.0+cu128

Usage

python
from unsloth import FastModel

model, tokenizer = FastModel.from_pretrained(
    model_name="naazimsnh02/Shifa-4B-SFT-LoRA",
    max_seq_length=4096,
    load_in_4bit=True,
)
FastModel.for_inference(model)

messages = [
    {"role": "user", "content": "A 45-year-old male presents with sudden chest pain radiating to the left arm. What is the most likely diagnosis and initial workup?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Roadmap

Stage-2 (GRPO): Reinforcement learning with verifiable MCQ rewards on MedQA + MedMCQA to strengthen overall medical reasoning accuracy.

Limitations

This is a research model. Not intended for clinical use or medical decision-making.
Performance on PubMedQA is limited; literature-based reasoning requires further tuning.
The model inherits biases and limitations from its base model and training data.