heuristixai/HAI-DiffDx-0.5B API & Inference Endpoint

Model Description

Given a symptom description, the model outputs a structured JSON object across five fields:

symptoms — the input symptom vignette
most_likely — the primary diagnosis with clinical reasoning
possible — exactly two alternative diagnoses with reasoning
unlikely_but_serious — a high-stakes diagnosis that must not be missed
recommended_workup — the next clinical step a physician would order The unlikely_but_serious field is the core research contribution of this model. It operationalises the clinical safety principle that low-probability, high-severity diagnoses must be actively considered regardless of base rate.

Training Details

Parameter	Value
Base model	Qwen2.5-0.5B-Instruct
Method	QLoRA (4-bit NF4)
LoRA rank	8
LoRA alpha	16
Training examples	200 (from 250-example curated dataset)
Clinical domains	10
Epochs	3
Hardware	NVIDIA GTX 1650 4GB
Training time	~32 minutes
Schema adherence (test set)	56% (vs 4% base model)

Usage

python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
BASE_MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
ADAPTER_PATH = "heuristixai/HAI-DiffDx-0.5B"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True
)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
model = PeftModel.from_pretrained(base_model, ADAPTER_PATH)
model.eval()
SYSTEM_PROMPT = """You are a clinical reasoning assistant trained to generate
structured
differential diagnoses. Given a symptom description, output a JSON object with exactly
five fields: symptoms, most_likely, possible (a list of exactly 2 entries),
unlikely_but_serious, and recommended_workup. Be specific, clinically grounded, and
always consider the most dangerous diagnosis that must not be missed."""
symptom = "Your symptom description here."
prompt =
f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n<|im_start|>user\n{symptom}<|im
_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=600,
temperature=0.1,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
repetition_penalty=1.1
)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:],
skip_special_tokens=True)
print(response)

Example Output

Input:
A 67-year-old woman with hypertension presents with sudden onset severe headache
described as the worst of her life, associated with neck stiffness and photophobia.
She vomited twice. No fever.

Output:

json
{
  "symptoms": "A 67-year-old woman with hypertension presents with sudden onset severe headache described as the worst of her life, associated with neck stiffness and photophobia. She vomited twice. No fever.",
  "most_likely": "Bacterial meningitis - fever, neck stiffness, and photophobia with acute headache in an older adult raises concern for meningeal infection.",
  "possible": [
    "Hypertensive emergency - severe headache with known hypertension warrants blood pressure evaluation and assessment for end-organ damage",
    "Migraine with meningismus - severe headache with photophobia and neck stiffness can occur in complex migraine, though first-ever presentation requires exclusion of serious causes"
  ],
  "unlikely_but_serious": "Subarachnoid haemorrhage - sudden onset worst-ever headache is the classic sentinel presentation of SAH; a normal CT does not exclude it and lumbar puncture for xanthochromia is mandatory if clinical suspicion remains.",
  "recommended_workup": "Urgent non-contrast CT head followed by lumbar puncture if CT is negative, blood pressure measurement, and urgent neurology review."

}

Evaluation Results

Model	Schema Adherence
Baseline (no fine-tune)	4% (1/25)
Version A (full schema)	56% (14/25)
Ablation B (no unlikely_serious)	52% (13/25)
Ablation C (no workup)	36% (9/25)

Research Paper

Full methodology, ablation study, and results available in the accompanying research
paper published by HeuristixAI Research (2026).

Intended Use and Limitations

This model is a research tool. It is not a medical device. Outputs must not be used for
clinical diagnosis or treatment decisions. The model's accuracy is bounded by its 0.5B
parameter capacity and performs most reliably on common presentations. Rare and
complex presentations may produce incorrect or malformed outputs.

Citation

bibtex
@techreport{tareen2026diffdx,
  title       = {Structured Clinical Differential Reasoning in Small Language Models: A Four-Tier Schema Approach via QLoRA Fine-Tuning},
  author      = {Tareen, Gibran Khan and Nawaz, Mir Farhan},
  year        = {2026},
  institution = {HeuristixAI Research},
  url         = {https://huggingface.co/heuristixai/HAI-DiffDx-0.5B}

}

HAI-DiffDx-0.5B

Get help setting up a custom Dedicated Endpoints.

README