Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Description

Given a symptom description, the model outputs a structured JSON object across five fields:

  • symptoms — the input symptom vignette
  • most_likely — the primary diagnosis with clinical reasoning
  • possible — exactly two alternative diagnoses with reasoning
  • unlikely_but_serious — a high-stakes diagnosis that must not be missed
  • recommended_workup — the next clinical step a physician would order The unlikely_but_serious field is the core research contribution of this model. It operationalises the clinical safety principle that low-probability, high-severity diagnoses must be actively considered regardless of base rate.

Training Details

ParameterValue
Base modelQwen2.5-0.5B-Instruct
MethodQLoRA (4-bit NF4)
LoRA rank8
LoRA alpha16
Training examples200 (from 250-example curated dataset)
Clinical domains10
Epochs3
HardwareNVIDIA GTX 1650 4GB
Training time~32 minutes
Schema adherence (test set)56% (vs 4% base model)

Usage

python

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
BASE_MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
ADAPTER_PATH = "heuristixai/HAI-DiffDx-0.5B"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True
)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
model = PeftModel.from_pretrained(base_model, ADAPTER_PATH)
model.eval()
SYSTEM_PROMPT = """You are a clinical reasoning assistant trained to generate
structured
differential diagnoses. Given a symptom description, output a JSON object with exactly
five fields: symptoms, most_likely, possible (a list of exactly 2 entries),
unlikely_but_serious, and recommended_workup. Be specific, clinically grounded, and
always consider the most dangerous diagnosis that must not be missed."""
symptom = "Your symptom description here."
prompt =
f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n<|im_start|>user\n{symptom}<|im
_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=600,
temperature=0.1,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
repetition_penalty=1.1
)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:],
skip_special_tokens=True)
print(response)

Example Output

Input:
A 67-year-old woman with hypertension presents with sudden onset severe headache
described as the worst of her life, associated with neck stiffness and photophobia.
She vomited twice. No fever.

Output:

json

{
"symptoms": "A 67-year-old woman with hypertension presents with sudden onset severe headache described as the worst of her life, associated with neck stiffness and photophobia. She vomited twice. No fever.",
"most_likely": "Bacterial meningitis - fever, neck stiffness, and photophobia with acute headache in an older adult raises concern for meningeal infection.",
"possible": [
"Hypertensive emergency - severe headache with known hypertension warrants blood pressure evaluation and assessment for end-organ damage",
"Migraine with meningismus - severe headache with photophobia and neck stiffness can occur in complex migraine, though first-ever presentation requires exclusion of serious causes"
],
"unlikely_but_serious": "Subarachnoid haemorrhage - sudden onset worst-ever headache is the classic sentinel presentation of SAH; a normal CT does not exclude it and lumbar puncture for xanthochromia is mandatory if clinical suspicion remains.",
"recommended_workup": "Urgent non-contrast CT head followed by lumbar puncture if CT is negative, blood pressure measurement, and urgent neurology review."

}

Evaluation Results

ModelSchema Adherence
Baseline (no fine-tune)4% (1/25)
Version A (full schema)56% (14/25)
Ablation B (no unlikely_serious)52% (13/25)
Ablation C (no workup)36% (9/25)

Research Paper

Full methodology, ablation study, and results available in the accompanying research
paper published by HeuristixAI Research (2026).


Intended Use and Limitations

This model is a research tool. It is not a medical device. Outputs must not be used for
clinical diagnosis or treatment decisions. The model's accuracy is bounded by its 0.5B
parameter capacity and performs most reliably on common presentations. Rare and
complex presentations may produce incorrect or malformed outputs.


Citation

bibtex

@techreport{tareen2026diffdx,
title = {Structured Clinical Differential Reasoning in Small Language Models: A Four-Tier Schema Approach via QLoRA Fine-Tuning},
author = {Tareen, Gibran Khan and Nawaz, Mir Farhan},
year = {2026},
institution = {HeuristixAI Research},
url = {https://huggingface.co/heuristixai/HAI-DiffDx-0.5B}

}

HeuristixAI Research · Compact AI. Real Impact. Open Research.

Model provider

heuristixai

Model tree

Base

Qwen/Qwen2.5-0.5B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today