How it was obtained
The model was trained with supervised fine-tuning (SFT) using TRL. Training samples were selected from the MIMIC-CXR dataset, which contains frontal and lateral chest radiographs paired with structured radiology reports.
Training details:
Table | |
|---|
| Base model | Qwen/Qwen3-VL-8B-Instruct |
| Training framework | TRL 0.26.2 (SFTTrainer) |
| Epochs | 2 |
| Total steps | 7 120 |
| Eval loss | 0.428 |
| Eval token accuracy | ~88% |
| Precision | bfloat16 |
| Hardware | 1× NVIDIA A100 |
Usage
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from qwen_vl_utils import process_vision_info
import torch
model = Qwen2VLForConditionalGeneration.from_pretrained(
"dmusingu/qwen3-vl-8b-mimic-cxr-sft",
torch_dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained("denmus/qwen3-vl-8b-mimic-cxr-sft")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": "<path_or_url_to_cxr>"},
{"type": "text", "text": "Describe the findings in this chest X-ray."},
],
}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, return_tensors="pt").to(model.device)
output_ids = model.generate(**inputs, max_new_tokens=256)
output = processor.batch_decode(output_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(output[0])
Data access
MIMIC-CXR is a credentialed dataset. Access requires PhysioNet registration and completion of the required training at physionet.org/content/mimic-cxr.
Framework versions
- TRL: 0.26.2
- Transformers: 5.7.0
- PyTorch: 2.11.0
- Datasets: 4.8.4
- Tokenizers: 0.22.2
Citation
If you use this model, please cite MIMIC-CXR:
@article{johnson2019mimic,
title = {MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports},
author = {Johnson, Alistair EW and Pollard, Tom J and Berkowitz, Seth J and others},
journal = {Scientific data},
volume = {6},
number = {1},
pages = {317},
year = {2019},
publisher = {Nature Publishing Group}
}