dilr

Mira-Q2

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Comprehensive Evaluation (782 docs across 4 test sets)

Table
Eval SetNTypeJSON ValidityIdentifier LeakField-F1
test_gold200Same distribution (held-out)100.0% [1.0-1.0]0.0%1.000 [0.999-1.0]
synthetic_v2150Different formatting dialect100.0% [1.0-1.0]0.0%n/a (unlabeled)
extraction_relevant150Real physician docs (on-schema)94.7% [90.7-98.0]0.0%n/a (unlabeled)
mtsamples282Real physician docs (39 specialties)85.8% [81.9-89.7]0.0%n/a (unlabeled)

95% bootstrap CIs (1000 resamples). Zero identifier leaks across all 782 documents.

Three-Way Comparison

Table
ModelTraining DataValidity (test_gold)F1 (test_gold)
Qwen2.5-3B zero-shot0% (invents own schema)0.0
Mira-Q1 (v1)3,438 examples98% (50-example eval)
Mira-Q2 (this model)8,400 examples100% (200-example eval)1.000

Training

Table
ParameterValue
Base modelQwen/Qwen2.5-3B-Instruct (via Unsloth)
MethodQLoRA (4-bit, r=16, alpha=32)
Training data8,400 examples (6,400 gold-by-construction + 2,000 schema variants)
Data sourcesReal ICD-10 codes (71K), NLM drug names, curated lab reference ranges
Schema variantsRenamed fields, dropped fields, minimal schemas (for generalization)
Epochs2
Final train loss0.132
Final eval loss0.142
Overfit gap0.010 (healthy)

Loss Curve

markdown

Step 50: 1.0723 (epoch 0.1)
Step 200: 0.1556 (epoch 0.4)
Step 525: 0.1414 (epoch 1.0) — checkpoint
Step 750: 0.1318 (epoch 1.4) — lowest
Step 1050: 0.1320 (epoch 2.0) — final
Eval: 0.1418 (epoch 2.0)

What's New vs Mira-Q1

  • 2.4x more training data (8,400 vs 3,438)
  • Gold-by-construction data — real ICD-10 codes, NLM drugs, real lab reference ranges (not Synthea-rendered)
  • Schema-variant training — 2,000 examples with modified schemas for schema-as-input generalization
  • 8% lower loss (0.132 vs 0.143)
  • 100% validity on 200-example gold eval (vs 98% on 50 examples)
  • Comprehensive eval on 782 docs including real physician dictations
  • Zero identifier leaks verified across all test sets

Synthetic-to-Real Gap

The honest finding: Mira-Q2 scores 100% on training-distribution data but 86% on general real physician prose (MTSamples). This is expected for a model trained on synthetic data — it learned our generator's patterns well but struggles with document types it never saw (operative notes, physical exams). The gap narrows to ~5% on on-schema real docs (94.7%).

This gap closes with: real partner data retraining (v1), broader document type coverage in training, and OCR pipeline integration.

Usage

python

# IMPORTANT: Load with Unsloth (not standard PeftModel — quantization mismatch)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="dilr/Mira-Q2",
max_seq_length=4096,
dtype=torch.float16,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
messages = [
{"role": "system", "content": "You are a clinical information extraction system..."},
{"role": "user", "content": "Patient: 45/M\nHb 12.5 g/dL (13-17) LOW\nWBC 8.2 x10^9/L (4-11) Normal"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=False,
eos_token_id=[tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|im_end|>")])
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Note: Do NOT load with PeftModel.from_pretrained(base, "dilr/Mira-Q2") — the adapter was trained with Unsloth's quantization which differs from standard bitsandbytes. Use FastLanguageModel as shown above.

Schema

Extracts 10 required fields:

  • document_type: lab_report | medication_list | discharge_summary | pathology_report | intake_form | progress_note | other
  • patient: {age, sex} — de-identified, never includes names/MRN
  • encounter: {date (ISO), department}
  • vitals[], labs[], medications[], diagnoses[], procedures[], allergies[]
  • extraction_notes

Architecture: Schema-as-Input

Mira-Q2 is trained with schema-variant examples — the model learns to follow any extraction schema injected in the system prompt, not just the clinical one. This enables customer onboarding with zero code changes (schema file + seed examples only).

Eval Data

The eval/ directory contains:

  • comprehensive_scorecard.json — full results with bootstrap CIs
  • test_gold_200_result.json — test_gold scorecard
  • mtsamples_282_result.json — real MTSamples probe
  • extraction_relevant_150_result.json — on-schema real docs
  • synthetic_v2_150_result.json — format robustness probe

Limitations

  • English only
  • Trained on synthetic data — real clinical document retraining improves accuracy (v1 with design partner)
  • 86% validity on general real docs (39 specialties) — strongest on lab/discharge/med types it was trained on
  • Every output is a draft for human review — not for autonomous clinical decisions
  • Must load with Unsloth (not vanilla PeftModel)

License

Apache-2.0 (same as base model)

Model provider

dilr

Model tree

Base

Qwen/Qwen2.5-3B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today