dilr
Mira-Q1
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Baseline Evaluation (800 gold examples)
| Model | JSON Validity | Notes |
|---|---|---|
| Qwen2.5-3B zero-shot | 0% | Invents own schema, ignores instructions |
| Mira-Q1 (this model) | 98% | Follows extraction schema correctly |
The base model produces 0/800 valid extractions — it invents field names like labTestResults, fullName instead of following the schema. Mira-Q1 fine-tuning is essential.
Training
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-3B-Instruct |
| Method | QLoRA (4-bit, r=16, alpha=32) |
| Training data | 3,438 examples (126 curated + 3,312 Synthea-rendered) |
| Epochs | 2 |
| Final loss | 0.14 |
| GPU | Kaggle T4 (free tier) |
| Training time | ~2h 40m |
Schema
Extracts 10 required fields from clinical documents:
document_type: lab_report | medication_list | discharge_summary | pathology_report | intake_form | progress_note | otherpatient: {age, sex} — de-identified, no names/MRNencounter: {date, department}vitals[],labs[],medications[],diagnoses[],procedures[],allergies[]extraction_notes
Usage
python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfigfrom peft import PeftModelimport torchbnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct", quantization_config=bnb, device_map="auto")tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")model = PeftModel.from_pretrained(base, "dilr/Mira-Q1")messages = [{"role": "system", "content": "You are a clinical information extraction system..."},{"role": "user", "content": "Patient: 45/M\nHb 12.5 g/dL (13-17) LOW"},]text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = tokenizer(text, return_tensors="pt").to("cuda")outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Base Model Zero-Shot Output (for comparison)
The base model without fine-tuning produces invalid output like:
json
{"patient": {"id": "87/F", "fullName": null, "dateOfBirth": "2024-01-29"},"labTestResults": [{"testName": "Total Cholesterol", ...}]}
Wrong field names, wrong structure, includes identifiers. Mira-Q1 fixes all of this.
Limitations
- English only (v1)
- Trained on synthetic data (Synthea + curated), not real clinical records
- Every output is a draft for human review
- Superseded by Mira-Q2 (coming soon) with 8,400 training examples and schema-as-input
License
Apache-2.0
Model provider
dilr
Model tree
Base
Qwen/Qwen2.5-3B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information