dilr
Mira-Q2
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Comprehensive Evaluation (782 docs across 4 test sets)
| Eval Set | N | Type | JSON Validity | Identifier Leak | Field-F1 |
|---|---|---|---|---|---|
| test_gold | 200 | Same distribution (held-out) | 100.0% [1.0-1.0] | 0.0% | 1.000 [0.999-1.0] |
| synthetic_v2 | 150 | Different formatting dialect | 100.0% [1.0-1.0] | 0.0% | n/a (unlabeled) |
| extraction_relevant | 150 | Real physician docs (on-schema) | 94.7% [90.7-98.0] | 0.0% | n/a (unlabeled) |
| mtsamples | 282 | Real physician docs (39 specialties) | 85.8% [81.9-89.7] | 0.0% | n/a (unlabeled) |
95% bootstrap CIs (1000 resamples). Zero identifier leaks across all 782 documents.
Three-Way Comparison
| Model | Training Data | Validity (test_gold) | F1 (test_gold) |
|---|---|---|---|
| Qwen2.5-3B zero-shot | — | 0% (invents own schema) | 0.0 |
| Mira-Q1 (v1) | 3,438 examples | 98% (50-example eval) | — |
| Mira-Q2 (this model) | 8,400 examples | 100% (200-example eval) | 1.000 |
Training
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-3B-Instruct (via Unsloth) |
| Method | QLoRA (4-bit, r=16, alpha=32) |
| Training data | 8,400 examples (6,400 gold-by-construction + 2,000 schema variants) |
| Data sources | Real ICD-10 codes (71K), NLM drug names, curated lab reference ranges |
| Schema variants | Renamed fields, dropped fields, minimal schemas (for generalization) |
| Epochs | 2 |
| Final train loss | 0.132 |
| Final eval loss | 0.142 |
| Overfit gap | 0.010 (healthy) |
Loss Curve
markdown
Step 50: 1.0723 (epoch 0.1)Step 200: 0.1556 (epoch 0.4)Step 525: 0.1414 (epoch 1.0) — checkpointStep 750: 0.1318 (epoch 1.4) — lowestStep 1050: 0.1320 (epoch 2.0) — finalEval: 0.1418 (epoch 2.0)
What's New vs Mira-Q1
- 2.4x more training data (8,400 vs 3,438)
- Gold-by-construction data — real ICD-10 codes, NLM drugs, real lab reference ranges (not Synthea-rendered)
- Schema-variant training — 2,000 examples with modified schemas for schema-as-input generalization
- 8% lower loss (0.132 vs 0.143)
- 100% validity on 200-example gold eval (vs 98% on 50 examples)
- Comprehensive eval on 782 docs including real physician dictations
- Zero identifier leaks verified across all test sets
Synthetic-to-Real Gap
The honest finding: Mira-Q2 scores 100% on training-distribution data but 86% on general real physician prose (MTSamples). This is expected for a model trained on synthetic data — it learned our generator's patterns well but struggles with document types it never saw (operative notes, physical exams). The gap narrows to ~5% on on-schema real docs (94.7%).
This gap closes with: real partner data retraining (v1), broader document type coverage in training, and OCR pipeline integration.
Usage
python
# IMPORTANT: Load with Unsloth (not standard PeftModel — quantization mismatch)from unsloth import FastLanguageModelmodel, tokenizer = FastLanguageModel.from_pretrained(model_name="dilr/Mira-Q2",max_seq_length=4096,dtype=torch.float16,load_in_4bit=True,)FastLanguageModel.for_inference(model)messages = [{"role": "system", "content": "You are a clinical information extraction system..."},{"role": "user", "content": "Patient: 45/M\nHb 12.5 g/dL (13-17) LOW\nWBC 8.2 x10^9/L (4-11) Normal"},]text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = tokenizer(text, return_tensors="pt").to("cuda")outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=False,eos_token_id=[tokenizer.eos_token_id,tokenizer.convert_tokens_to_ids("<|im_end|>")])print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Note: Do NOT load with PeftModel.from_pretrained(base, "dilr/Mira-Q2") — the adapter was trained with Unsloth's quantization which differs from standard bitsandbytes. Use FastLanguageModel as shown above.
Schema
Extracts 10 required fields:
document_type: lab_report | medication_list | discharge_summary | pathology_report | intake_form | progress_note | otherpatient: {age, sex} — de-identified, never includes names/MRNencounter: {date (ISO), department}vitals[],labs[],medications[],diagnoses[],procedures[],allergies[]extraction_notes
Architecture: Schema-as-Input
Mira-Q2 is trained with schema-variant examples — the model learns to follow any extraction schema injected in the system prompt, not just the clinical one. This enables customer onboarding with zero code changes (schema file + seed examples only).
Eval Data
The eval/ directory contains:
comprehensive_scorecard.json— full results with bootstrap CIstest_gold_200_result.json— test_gold scorecardmtsamples_282_result.json— real MTSamples probeextraction_relevant_150_result.json— on-schema real docssynthetic_v2_150_result.json— format robustness probe
Limitations
- English only
- Trained on synthetic data — real clinical document retraining improves accuracy (v1 with design partner)
- 86% validity on general real docs (39 specialties) — strongest on lab/discharge/med types it was trained on
- Every output is a draft for human review — not for autonomous clinical decisions
- Must load with Unsloth (not vanilla PeftModel)
License
Apache-2.0 (same as base model)
Model provider
dilr
Model tree
Base
Qwen/Qwen2.5-3B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information