ai-mind-lab/CineMR API & Inference Endpoint

Model summary


Architecture	`Qwen3VLForConditionalGeneration` (`qwen3_vl`)
Parameters	~8.8B
Precision	bfloat16 (`dtype` in `config.json`)
Base model	`Qwen/Qwen3-VL-8B-Instruct`
SFT init	Merged SFT checkpoint on cardiac VQA
RL algorithm	GRPO (EasyR1), LoRA r=64 / α=128 on language layers (vision frozen during LoRA)
Transformers	Exported with `transformers` 5.8.x

Intended use

Answer questions about cardiac cine / volumetric MRI when given frame images or short video clips.
Supports the structured answer format used in CineMR training: final answers in \boxed{...} and optional <tool_call> blocks for measurement-style reasoning.

Not for clinical decision-making. This model is a research artifact; outputs must not be used for diagnosis or treatment without expert review and appropriate validation.

Artifact	Purpose
`model.safetensors`	Full merged weights (SFT + GRPO LoRA), single shard
`config.json`	Model architecture and dtype
`tokenizer.json`, `tokenizer_config.json`, `vocab.json`, `merges.txt`, …	Text tokenizer
`preprocessor_config.json`, `video_preprocessor_config.json`	Image / video preprocessing for `Qwen3VLProcessor`
`chat_template.jinja`	Chat formatting
`generation_config.json`	Default generation settings

Loading

python
import torch
from transformers import AutoModelForVision2Seq, AutoProcessor

repo_id = "ai-mind-lab/CineMR"  # or a local path to this directory

model = AutoModelForVision2Seq.from_pretrained(
    repo_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=True)

Example: single-image VQA

python
from PIL import Image

image = Image.open("path/to/frame.png").convert("RGB")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "What is the left ventricular ejection fraction?"},
        ],
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=512, do_sample=False)

print(processor.decode(out[0], skip_special_tokens=True))

Use the same trust_remote_code=True and bfloat16 settings as in training. For evaluation, match the CineMR prompt template and decoding settings used in your eval script.

Training procedure (summary)

SFT on CineMR JSONL (train split) starting from Qwen3-VL-8B-Instruct; weights merged to a full transformers checkpoint.
GRPO in EasyR1 with:
- Reward: reward_cardiac_vqa.py (compute_score) — accuracy on \boxed{} answers plus format / tool-use terms.
- Rollout: vLLM, n=2 samples per prompt, max response length 1024.
- Actor LR 1e-5, KL coefficient 0.01, global batch size 4.
- Image frames from preprocessed cine caches; pixel budget aligned with Qwen3-VL (min/max pixels in training config).

LoRA weights are merged into the base checkpoint for Hub deployment.

Evaluation

Evaluate on the CineMR test split with the same frame paths and prompt template as training. Report metrics on extracted \boxed{} answers and, if applicable, tool-call correctness. See the project eval scripts under CardiacCine/cardiac_cine/cine-cogito/ for reference pipelines.

Limitations

Trained on public cardiac MRI challenge-style corpora (ACDC, M&Ms, M&Ms-2); generalization to other scanners, sequences, or pathologies is not guaranteed.
GRPO training used a small validation set for checkpoint tracking; prefer held-out test evaluation before drawing conclusions.
Tool-use formatting in outputs may be inconsistent unless prompts and decoding match training.

License

This model inherits terms from Qwen3-VL (Apache 2.0) and your use of CineMR data and any dataset/challenge restrictions (ACDC, M&Ms, etc.). Use only for lawful research purposes.

Citation

If you use CineMR, please cite the base Qwen3-VL model and acknowledge the CineMR dataset and cardiac imaging sources:

bibtex
@misc{cinemr_qwen3vl8b_grpo,
  title        = {CineMR: Cardiac MRI Vision-Language Model (Qwen3-VL-8B, GRPO)},
  author       = {AI Mind Lab},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/ai-mind-lab/CineMR}},
  note         = {GRPO checkpoint; dataset at huggingface.co/datasets/ai-mind-lab/CineMR},
}

bibtex
@article{qwen3vl,
  title  = {Qwen3-VL Technical Report},
  author = {Qwen Team},
  year   = {2025},
}

CineMR

Get help setting up a custom Dedicated Endpoints.

README