naazimsnh02

Shifa-4B-SFT-LoRA

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Benchmark Results (Stage-1 SFT)

Table
BenchmarkShifa-4BMedGemma-4B-ITDelta
MedQA (USMLE, 4-opt)0.7020.691+0.011
MedMCQA0.5080.598-0.090
PubMedQA0.3740.682-0.308

MedGemma targets are from the google/medgemma-4b-it model card. Evaluated on 500 samples per benchmark using greedy decoding.

Training Details

Data

~395K examples blended from three medical reasoning datasets plus a general-instruction mix to prevent catastrophic forgetting:

Table
SourceSamplesDescription
FreedomIntelligence/medical-o1-reasoning-SFT~20KHuatuoGPT-o1 gold chain-of-thought traces
OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B200KHigh-effort reasoning from GPT-OSS-120B
Intelligent-Internet/II-Medical-Reasoning-SFT150KSampled for scale and diversity
HuggingFaceH4/ultrachat_200k~28K (7%)General instruction mix

Hyperparameters

Table
ParameterValue
MethodQLoRA (4-bit NF4)
LoRA rank (r)32
LoRA alpha64
Target modulesAll attention + MLP (language layers only)
Vision towerFrozen
Batch size4 × 4 gradient accumulation (effective 16)
Learning rate1e-4 (cosine schedule)
Max sequence length4096
Training steps10,000
OptimizerAdamW 8-bit
Warmup3% of steps
Training time~21.7 hours on 1× A100-SXM4-80GB
Loss maskingResponse-only (assistant turns)

Framework

  • Unsloth 2026.6.7
  • TRL 0.22.2 (SFTTrainer)
  • Transformers 5.12.1
  • PyTorch 2.8.0+cu128

Usage

python

from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="naazimsnh02/Shifa-4B-SFT-LoRA",
max_seq_length=4096,
load_in_4bit=True,
)
FastModel.for_inference(model)
messages = [
{"role": "user", "content": "A 45-year-old male presents with sudden chest pain radiating to the left arm. What is the most likely diagnosis and initial workup?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Roadmap

  • Stage-2 (GRPO): Reinforcement learning with verifiable MCQ rewards on MedQA + MedMCQA to strengthen overall medical reasoning accuracy.

Limitations

  • This is a research model. Not intended for clinical use or medical decision-making.
  • Performance on PubMedQA is limited; literature-based reasoning requires further tuning.
  • The model inherits biases and limitations from its base model and training data.

License

Apache 2.0 — same as the base model (Qwen3.5-4B).

Model provider

naazimsnh02

Model tree

Base

Qwen/Qwen3.5-4B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today