naazimsnh02
Shifa-4B-SFT-LoRA
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Benchmark Results (Stage-1 SFT)
| Benchmark | Shifa-4B | MedGemma-4B-IT | Delta |
|---|---|---|---|
| MedQA (USMLE, 4-opt) | 0.702 | 0.691 | +0.011 |
| MedMCQA | 0.508 | 0.598 | -0.090 |
| PubMedQA | 0.374 | 0.682 | -0.308 |
MedGemma targets are from the google/medgemma-4b-it model card. Evaluated on 500 samples per benchmark using greedy decoding.
Training Details
Data
~395K examples blended from three medical reasoning datasets plus a general-instruction mix to prevent catastrophic forgetting:
| Source | Samples | Description |
|---|---|---|
| FreedomIntelligence/medical-o1-reasoning-SFT | ~20K | HuatuoGPT-o1 gold chain-of-thought traces |
| OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B | 200K | High-effort reasoning from GPT-OSS-120B |
| Intelligent-Internet/II-Medical-Reasoning-SFT | 150K | Sampled for scale and diversity |
| HuggingFaceH4/ultrachat_200k | ~28K (7%) | General instruction mix |
Hyperparameters
| Parameter | Value |
|---|---|
| Method | QLoRA (4-bit NF4) |
| LoRA rank (r) | 32 |
| LoRA alpha | 64 |
| Target modules | All attention + MLP (language layers only) |
| Vision tower | Frozen |
| Batch size | 4 × 4 gradient accumulation (effective 16) |
| Learning rate | 1e-4 (cosine schedule) |
| Max sequence length | 4096 |
| Training steps | 10,000 |
| Optimizer | AdamW 8-bit |
| Warmup | 3% of steps |
| Training time | ~21.7 hours on 1× A100-SXM4-80GB |
| Loss masking | Response-only (assistant turns) |
Framework
Usage
python
from unsloth import FastModelmodel, tokenizer = FastModel.from_pretrained(model_name="naazimsnh02/Shifa-4B-SFT-LoRA",max_seq_length=4096,load_in_4bit=True,)FastModel.for_inference(model)messages = [{"role": "user", "content": "A 45-year-old male presents with sudden chest pain radiating to the left arm. What is the most likely diagnosis and initial workup?"},]text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = tokenizer([text], return_tensors="pt").to("cuda")output = model.generate(**inputs, max_new_tokens=1024, do_sample=False)print(tokenizer.decode(output[0], skip_special_tokens=True))
Roadmap
- Stage-2 (GRPO): Reinforcement learning with verifiable MCQ rewards on MedQA + MedMCQA to strengthen overall medical reasoning accuracy.
Limitations
- This is a research model. Not intended for clinical use or medical decision-making.
- Performance on PubMedQA is limited; literature-based reasoning requires further tuning.
- The model inherits biases and limitations from its base model and training data.
License
Apache 2.0 — same as the base model (Qwen3.5-4B).
Model provider
naazimsnh02
Model tree
Base
Qwen/Qwen3.5-4B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information