Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model summary
| Architecture | Qwen3VLForConditionalGeneration (qwen3_vl) |
| Parameters | ~8.8B |
| Precision | bfloat16 (dtype in config.json) |
| Base model | Qwen/Qwen3-VL-8B-Instruct |
| SFT init | Merged SFT checkpoint on cardiac VQA |
| RL algorithm | GRPO (EasyR1), LoRA r=64 / α=128 on language layers (vision frozen during LoRA) |
| Transformers | Exported with transformers 5.8.x |
Intended use
- Answer questions about cardiac cine / volumetric MRI when given frame images or short video clips.
- Supports the structured answer format used in CineMR training: final answers in
\boxed{...}and optional<tool_call>blocks for measurement-style reasoning.
Not for clinical decision-making. This model is a research artifact; outputs must not be used for diagnosis or treatment without expert review and appropriate validation.
Contents
| Artifact | Purpose |
|---|---|
model.safetensors | Full merged weights (SFT + GRPO LoRA), single shard |
config.json | Model architecture and dtype |
tokenizer.json, tokenizer_config.json, vocab.json, merges.txt, … | Text tokenizer |
preprocessor_config.json, video_preprocessor_config.json | Image / video preprocessing for Qwen3VLProcessor |
chat_template.jinja | Chat formatting |
generation_config.json | Default generation settings |
Loading
python
import torchfrom transformers import AutoModelForVision2Seq, AutoProcessorrepo_id = "ai-mind-lab/CineMR" # or a local path to this directorymodel = AutoModelForVision2Seq.from_pretrained(repo_id,torch_dtype=torch.bfloat16,device_map="auto",trust_remote_code=True,)processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=True)
Example: single-image VQA
python
from PIL import Imageimage = Image.open("path/to/frame.png").convert("RGB")messages = [{"role": "user","content": [{"type": "image", "image": image},{"type": "text", "text": "What is the left ventricular ejection fraction?"},],}]text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)with torch.no_grad():out = model.generate(**inputs, max_new_tokens=512, do_sample=False)print(processor.decode(out[0], skip_special_tokens=True))
Use the same trust_remote_code=True and bfloat16 settings as in training. For evaluation, match the CineMR prompt template and decoding settings used in your eval script.
Training procedure (summary)
- SFT on CineMR JSONL (train split) starting from Qwen3-VL-8B-Instruct; weights merged to a full
transformerscheckpoint. - GRPO in EasyR1 with:
- Reward:
reward_cardiac_vqa.py(compute_score) — accuracy on\boxed{}answers plus format / tool-use terms. - Rollout: vLLM,
n=2samples per prompt, max response length 1024. - Actor LR
1e-5, KL coefficient0.01, global batch size 4. - Image frames from preprocessed cine caches; pixel budget aligned with Qwen3-VL (min/max pixels in training config).
- Reward:
LoRA weights are merged into the base checkpoint for Hub deployment.
Evaluation
Evaluate on the CineMR test split with the same frame paths and prompt template as training. Report metrics on extracted \boxed{} answers and, if applicable, tool-call correctness. See the project eval scripts under CardiacCine/cardiac_cine/cine-cogito/ for reference pipelines.
Limitations
- Trained on public cardiac MRI challenge-style corpora (ACDC, M&Ms, M&Ms-2); generalization to other scanners, sequences, or pathologies is not guaranteed.
- GRPO training used a small validation set for checkpoint tracking; prefer held-out test evaluation before drawing conclusions.
- Tool-use formatting in outputs may be inconsistent unless prompts and decoding match training.
License
This model inherits terms from Qwen3-VL (Apache 2.0) and your use of CineMR data and any dataset/challenge restrictions (ACDC, M&Ms, etc.). Use only for lawful research purposes.
Citation
If you use CineMR, please cite the base Qwen3-VL model and acknowledge the CineMR dataset and cardiac imaging sources:
bibtex
@misc{cinemr_qwen3vl8b_grpo,title = {CineMR: Cardiac MRI Vision-Language Model (Qwen3-VL-8B, GRPO)},author = {AI Mind Lab},year = {2026},howpublished = {\url{https://huggingface.co/ai-mind-lab/CineMR}},note = {GRPO checkpoint; dataset at huggingface.co/datasets/ai-mind-lab/CineMR},}
bibtex
@article{qwen3vl,title = {Qwen3-VL Technical Report},author = {Qwen Team},year = {2025},}
Model provider
ai-mind-lab
Model tree
Base
Qwen/Qwen3-VL-8B-Instruct
Fine-tuned
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information