Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Description
Given a symptom description, the model outputs a structured JSON object across five fields:
- symptoms — the input symptom vignette
- most_likely — the primary diagnosis with clinical reasoning
- possible — exactly two alternative diagnoses with reasoning
- unlikely_but_serious — a high-stakes diagnosis that must not be missed
- recommended_workup — the next clinical step a physician would order
The
unlikely_but_seriousfield is the core research contribution of this model. It operationalises the clinical safety principle that low-probability, high-severity diagnoses must be actively considered regardless of base rate.
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen2.5-0.5B-Instruct |
| Method | QLoRA (4-bit NF4) |
| LoRA rank | 8 |
| LoRA alpha | 16 |
| Training examples | 200 (from 250-example curated dataset) |
| Clinical domains | 10 |
| Epochs | 3 |
| Hardware | NVIDIA GTX 1650 4GB |
| Training time | ~32 minutes |
| Schema adherence (test set) | 56% (vs 4% base model) |
Usage
python
import torchfrom transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfigfrom peft import PeftModelBASE_MODEL = "Qwen/Qwen2.5-0.5B-Instruct"ADAPTER_PATH = "heuristixai/HAI-DiffDx-0.5B"bnb_config = BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_quant_type="nf4",bnb_4bit_compute_dtype=torch.float16,bnb_4bit_use_double_quant=True)tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)base_model = AutoModelForCausalLM.from_pretrained(BASE_MODEL,quantization_config=bnb_config,device_map="auto",trust_remote_code=True)model = PeftModel.from_pretrained(base_model, ADAPTER_PATH)model.eval()SYSTEM_PROMPT = """You are a clinical reasoning assistant trained to generatestructureddifferential diagnoses. Given a symptom description, output a JSON object with exactlyfive fields: symptoms, most_likely, possible (a list of exactly 2 entries),unlikely_but_serious, and recommended_workup. Be specific, clinically grounded, andalways consider the most dangerous diagnosis that must not be missed."""symptom = "Your symptom description here."prompt =f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n<|im_start|>user\n{symptom}<|im_end|>\n<|im_start|>assistant\n"inputs = tokenizer(prompt, return_tensors="pt").to(model.device)with torch.no_grad():outputs = model.generate(**inputs,max_new_tokens=600,temperature=0.1,do_sample=True,pad_token_id=tokenizer.eos_token_id,repetition_penalty=1.1)response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:],skip_special_tokens=True)print(response)
Example Output
Input:
A 67-year-old woman with hypertension presents with sudden onset severe headache
described as the worst of her life, associated with neck stiffness and photophobia.
She vomited twice. No fever.
Output:
json
{"symptoms": "A 67-year-old woman with hypertension presents with sudden onset severe headache described as the worst of her life, associated with neck stiffness and photophobia. She vomited twice. No fever.","most_likely": "Bacterial meningitis - fever, neck stiffness, and photophobia with acute headache in an older adult raises concern for meningeal infection.","possible": ["Hypertensive emergency - severe headache with known hypertension warrants blood pressure evaluation and assessment for end-organ damage","Migraine with meningismus - severe headache with photophobia and neck stiffness can occur in complex migraine, though first-ever presentation requires exclusion of serious causes"],"unlikely_but_serious": "Subarachnoid haemorrhage - sudden onset worst-ever headache is the classic sentinel presentation of SAH; a normal CT does not exclude it and lumbar puncture for xanthochromia is mandatory if clinical suspicion remains.","recommended_workup": "Urgent non-contrast CT head followed by lumbar puncture if CT is negative, blood pressure measurement, and urgent neurology review."
}
Evaluation Results
| Model | Schema Adherence |
|---|---|
| Baseline (no fine-tune) | 4% (1/25) |
| Version A (full schema) | 56% (14/25) |
| Ablation B (no unlikely_serious) | 52% (13/25) |
| Ablation C (no workup) | 36% (9/25) |
Research Paper
Full methodology, ablation study, and results available in the accompanying research
paper published by HeuristixAI Research (2026).
Intended Use and Limitations
This model is a research tool. It is not a medical device. Outputs must not be used for
clinical diagnosis or treatment decisions. The model's accuracy is bounded by its 0.5B
parameter capacity and performs most reliably on common presentations. Rare and
complex presentations may produce incorrect or malformed outputs.
Citation
bibtex
@techreport{tareen2026diffdx,title = {Structured Clinical Differential Reasoning in Small Language Models: A Four-Tier Schema Approach via QLoRA Fine-Tuning},author = {Tareen, Gibran Khan and Nawaz, Mir Farhan},year = {2026},institution = {HeuristixAI Research},url = {https://huggingface.co/heuristixai/HAI-DiffDx-0.5B}
}
HeuristixAI Research · Compact AI. Real Impact. Open Research.
Model provider
heuristixai
Model tree
Base
Qwen/Qwen2.5-0.5B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information