Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model summary

ArchitectureQwen3VLForConditionalGeneration (qwen3_vl)
Parameters~8.8B
Precisionbfloat16 (dtype in config.json)
Base modelQwen/Qwen3-VL-8B-Instruct
SFT initMerged SFT checkpoint on cardiac VQA
RL algorithmGRPO (EasyR1), LoRA r=64 / α=128 on language layers (vision frozen during LoRA)
TransformersExported with transformers 5.8.x

Intended use

  • Answer questions about cardiac cine / volumetric MRI when given frame images or short video clips.
  • Supports the structured answer format used in CineMR training: final answers in \boxed{...} and optional <tool_call> blocks for measurement-style reasoning.

Not for clinical decision-making. This model is a research artifact; outputs must not be used for diagnosis or treatment without expert review and appropriate validation.

Contents

ArtifactPurpose
model.safetensorsFull merged weights (SFT + GRPO LoRA), single shard
config.jsonModel architecture and dtype
tokenizer.json, tokenizer_config.json, vocab.json, merges.txt, …Text tokenizer
preprocessor_config.json, video_preprocessor_config.jsonImage / video preprocessing for Qwen3VLProcessor
chat_template.jinjaChat formatting
generation_config.jsonDefault generation settings

Loading

python

import torch
from transformers import AutoModelForVision2Seq, AutoProcessor
repo_id = "ai-mind-lab/CineMR" # or a local path to this directory
model = AutoModelForVision2Seq.from_pretrained(
repo_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=True)

Example: single-image VQA

python

from PIL import Image
image = Image.open("path/to/frame.png").convert("RGB")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "What is the left ventricular ejection fraction?"},
],
}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(processor.decode(out[0], skip_special_tokens=True))

Use the same trust_remote_code=True and bfloat16 settings as in training. For evaluation, match the CineMR prompt template and decoding settings used in your eval script.

Training procedure (summary)

  1. SFT on CineMR JSONL (train split) starting from Qwen3-VL-8B-Instruct; weights merged to a full transformers checkpoint.
  2. GRPO in EasyR1 with:
    • Reward: reward_cardiac_vqa.py (compute_score) — accuracy on \boxed{} answers plus format / tool-use terms.
    • Rollout: vLLM, n=2 samples per prompt, max response length 1024.
    • Actor LR 1e-5, KL coefficient 0.01, global batch size 4.
    • Image frames from preprocessed cine caches; pixel budget aligned with Qwen3-VL (min/max pixels in training config).

LoRA weights are merged into the base checkpoint for Hub deployment.

Evaluation

Evaluate on the CineMR test split with the same frame paths and prompt template as training. Report metrics on extracted \boxed{} answers and, if applicable, tool-call correctness. See the project eval scripts under CardiacCine/cardiac_cine/cine-cogito/ for reference pipelines.

Limitations

  • Trained on public cardiac MRI challenge-style corpora (ACDC, M&Ms, M&Ms-2); generalization to other scanners, sequences, or pathologies is not guaranteed.
  • GRPO training used a small validation set for checkpoint tracking; prefer held-out test evaluation before drawing conclusions.
  • Tool-use formatting in outputs may be inconsistent unless prompts and decoding match training.

License

This model inherits terms from Qwen3-VL (Apache 2.0) and your use of CineMR data and any dataset/challenge restrictions (ACDC, M&Ms, etc.). Use only for lawful research purposes.

Citation

If you use CineMR, please cite the base Qwen3-VL model and acknowledge the CineMR dataset and cardiac imaging sources:

bibtex

@misc{cinemr_qwen3vl8b_grpo,
title = {CineMR: Cardiac MRI Vision-Language Model (Qwen3-VL-8B, GRPO)},
author = {AI Mind Lab},
year = {2026},
howpublished = {\url{https://huggingface.co/ai-mind-lab/CineMR}},
note = {GRPO checkpoint; dataset at huggingface.co/datasets/ai-mind-lab/CineMR},
}

bibtex

@article{qwen3vl,
title = {Qwen3-VL Technical Report},
author = {Qwen Team},
year = {2025},
}

Model provider

ai-mind-lab

Model tree

Base

Qwen/Qwen3-VL-8B-Instruct

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today