ZJU-AI4H

Hulu-Med-30A3

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

🔥 News

[2025-11-27] ⚡ Hulu-Med is now compatible with the latest vLLM, offering faster inference and tensor parallel support! Thank you all for your patience and feedback 💪 see here for installation
[2025-11-18] 🎊 We released Hulu-Med-4B, a lightweight model with strong multimodal and text reasoning abilities that surpasses MedGemma-4B and Lingshu-7B!
[2025-11-01] 📊 Releasing our new evaluation code, MedUniEval! Built on MedEvalKit, MedUniEval is designed for the comprehensive evaluation of medical visual-language models across various modalities—including text, 2D, 3D, and video. More benchmarks are coming soon. Some processed evaluation data are available here.
[2025-10-16] 🚀 Demo Is Live! We've just deployed a demo and we'd love for you to try it! Your insights and feedback are crucial for helping us improve the model in the next version.
[2025-10-15] 🎉 Hulu-Med now supports Transformers integration! HuggingFace-compatible models released with simplified loading and inference. Integration with VLLM is ongoing. The HF models are now available in the main branch on Hugging Face.
The model has been updated in the main branch of our Hugging Face repository. You can now load it directly using AutoModelForCausalLM.from_pretrained - the weights will be automatically downloaded. For users in regions with limited access, you can set the HF mirror environment variable to ensure reliable downloads:

bash
export HF_ENDPOINT=https://hf-mirror.com

[2025-10-08] Hulu-Med models and inference code released!

📖 Overview

Hulu-Med is a transparent medical vision-language model that unifies understanding across diverse modalities including medical text, 2D/3D images, and videos. Built with a focus on transparency and accessibility, Hulu-Med achieves state-of-the-art performance on 30 medical benchmarks while being trained entirely on public data.

Key Features

🌟 Holistic Multimodal Understanding: Seamlessly processes medical text, 2D images, 3D volumes, and surgical videos
🔓 Fully Transparent: Complete open-source pipeline including data curation, training code, and model weights
📊 State-of-the-Art Performance: Outperforms leading open-source models and competes with proprietary systems
⚡ Efficient Training: Only 4,000-40,000 GPU hours required for 7B-32B variants
🗂️ Comprehensive Coverage: Trained on 16.7M samples spanning 12 anatomical systems and 14 imaging modalities
🤗 Transformers Native: Now with native HuggingFace Transformers support for easier integration

Comprehensive Data Coverage

Our training corpus encompasses:

12 Major Anatomical Systems: Multi-System, Skin/Integumentary, Respiratory, Cellular/Tissue Level, Digestive, Nervous, Cardiovascular, Musculoskeletal, Reproductive, Urinary, Whole Body, Endocrine, Immune/Lymphatic, and Hematologic systems
14 Medical Imaging Modalities: CT, MRI, X-Ray, Ultrasound, PET, OCT, Endoscopy, Microscopy, Histopathology, Fundus, Dermoscopy, Angiography, Digital Photograph, and Medical Chart
Diverse Downstream Tasks: Medical Dialogue, Anomaly Detection, Prognosis Prediction, Treatment Planning, Surgical Skill Assessment, Education, Medical Report Generation, Surgical Phase Recognition, Medical Computation, and more

💻 Quick Start

Note: As a MoE-based model, Hulu-30A3/235A22 is recommended to be served via vLLM or SGLang for optimal performance and efficiency.

Start the Server

vLLM

bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PYTHONPATH=./Swift-HuluMed/ swift deploy \
    --model Hulu-30A3 \
    --infer_backend vllm \
    --vllm_tensor_parallel_size 8 \
    --vllm_engine_kwargs '{"data_parallel_size": 1, "enable_chunked_prefill": true, "enable_multimodal_encoder_data_parallel": false}' \
    --vllm_max_num_seqs 512 \
    --vllm_enable_expert_parallel \
    --vllm_max_model_len 75538 \
    --vllm_gpu_memory_utilization 0.85 \
    --model_type qwen3_vl_moe \
    --port 8000 \
    --served_model_name hulu

SGLang

bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PYTHONPATH=./Swift-HuluMed/ swift deploy \
    --model Hulu-30A3 \
    --infer_backend sglang \
    --max_new_tokens 128000 \
    --sglang_context_length 128000 \
    --sglang_tp_size 8 \
    --model_type qwen3_moe_vl \
    --port 8000 \
    --served_model_name hulu

Inference via OpenAI-Compatible API

Text Example

python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

response = client.chat.completions.create(
    model="hulu",
    messages=[{"role": "user", "content": "Hello, I have a headache, what should I do?"}],
    max_tokens=1024,
    temperature=0,
)
print(response.choices[0].message.content)

Image Example

python
import base64
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

with open("./demo/demo.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="hulu",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}},
            {"type": "text", "text": "Generate a medical report for this image."},
        ],
    }],
    max_tokens=1024,
    temperature=0,
)
print(response.choices[0].message.content)

📋 Supported Tasks

✅ Visual Question Answering (2D/3D/Video)
✅ Medical Report Generation
✅ Disease Diagnosis
✅ Anatomical Understanding
✅ Surgical Phase Recognition
✅ Clinical Dialogue
✅ Medical Text Reasoning
✅ Multilingual Medical QA
✅ Rare Disease Diagnosis
✅ And more

📄 Citation

If you find Hulu-Med useful in your research, please cite:

bibtex
@misc{jiang2025hulumedtransparentgeneralistmodel,
      title={Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding}, 
      author={Songtao Jiang and Yuan Wang and Sibo Song and Tianxiang Hu and Chenyi Zhou and Bin Pu and Yan Zhang and Zhibo Yang and Yang Feng and Joey Tianyi Zhou and Jin Hao and Zijian Chen and Ruijia Wu and Tao Tang and Junhui Lv and Hongxia Xu and Hongwei Wang and Jun Xiao and Bin Feng and Fudong Zhu and Kenli Li and Weidi Xie and Jimeng Sun and Jian Wu and Zuozhu Liu},
      year={2025},
      eprint={2510.08668},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.08668}, 
}

📜 License

This project is released under the Apache 2.0 License.

Model provider

ZJU-AI4H

Model tree

Base

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

🔥 News

[2025-11-27] ⚡ Hulu-Med is now compatible with the latest vLLM, offering faster inference and tensor parallel support! Thank you all for your patience and feedback 💪 see here for installation
[2025-11-18] 🎊 We released Hulu-Med-4B, a lightweight model with strong multimodal and text reasoning abilities that surpasses MedGemma-4B and Lingshu-7B!
[2025-11-01] 📊 Releasing our new evaluation code, MedUniEval! Built on MedEvalKit, MedUniEval is designed for the comprehensive evaluation of medical visual-language models across various modalities—including text, 2D, 3D, and video. More benchmarks are coming soon. Some processed evaluation data are available here.
[2025-10-16] 🚀 Demo Is Live! We've just deployed a demo and we'd love for you to try it! Your insights and feedback are crucial for helping us improve the model in the next version.
[2025-10-15] 🎉 Hulu-Med now supports Transformers integration! HuggingFace-compatible models released with simplified loading and inference. Integration with VLLM is ongoing. The HF models are now available in the main branch on Hugging Face.
The model has been updated in the main branch of our Hugging Face repository. You can now load it directly using AutoModelForCausalLM.from_pretrained - the weights will be automatically downloaded. For users in regions with limited access, you can set the HF mirror environment variable to ensure reliable downloads:

bash
export HF_ENDPOINT=https://hf-mirror.com

[2025-10-08] Hulu-Med models and inference code released!

📖 Overview

Key Features

🌟 Holistic Multimodal Understanding: Seamlessly processes medical text, 2D images, 3D volumes, and surgical videos
🔓 Fully Transparent: Complete open-source pipeline including data curation, training code, and model weights
📊 State-of-the-Art Performance: Outperforms leading open-source models and competes with proprietary systems
⚡ Efficient Training: Only 4,000-40,000 GPU hours required for 7B-32B variants
🗂️ Comprehensive Coverage: Trained on 16.7M samples spanning 12 anatomical systems and 14 imaging modalities
🤗 Transformers Native: Now with native HuggingFace Transformers support for easier integration

Comprehensive Data Coverage

Our training corpus encompasses:

12 Major Anatomical Systems: Multi-System, Skin/Integumentary, Respiratory, Cellular/Tissue Level, Digestive, Nervous, Cardiovascular, Musculoskeletal, Reproductive, Urinary, Whole Body, Endocrine, Immune/Lymphatic, and Hematologic systems
14 Medical Imaging Modalities: CT, MRI, X-Ray, Ultrasound, PET, OCT, Endoscopy, Microscopy, Histopathology, Fundus, Dermoscopy, Angiography, Digital Photograph, and Medical Chart
Diverse Downstream Tasks: Medical Dialogue, Anomaly Detection, Prognosis Prediction, Treatment Planning, Surgical Skill Assessment, Education, Medical Report Generation, Surgical Phase Recognition, Medical Computation, and more

💻 Quick Start

Note: As a MoE-based model, Hulu-30A3/235A22 is recommended to be served via vLLM or SGLang for optimal performance and efficiency.

Start the Server

vLLM

bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PYTHONPATH=./Swift-HuluMed/ swift deploy \
    --model Hulu-30A3 \
    --infer_backend vllm \
    --vllm_tensor_parallel_size 8 \
    --vllm_engine_kwargs '{"data_parallel_size": 1, "enable_chunked_prefill": true, "enable_multimodal_encoder_data_parallel": false}' \
    --vllm_max_num_seqs 512 \
    --vllm_enable_expert_parallel \
    --vllm_max_model_len 75538 \
    --vllm_gpu_memory_utilization 0.85 \
    --model_type qwen3_vl_moe \
    --port 8000 \
    --served_model_name hulu

SGLang

bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PYTHONPATH=./Swift-HuluMed/ swift deploy \
    --model Hulu-30A3 \
    --infer_backend sglang \
    --max_new_tokens 128000 \
    --sglang_context_length 128000 \
    --sglang_tp_size 8 \
    --model_type qwen3_moe_vl \
    --port 8000 \
    --served_model_name hulu

Inference via OpenAI-Compatible API

Text Example

python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

response = client.chat.completions.create(
    model="hulu",
    messages=[{"role": "user", "content": "Hello, I have a headache, what should I do?"}],
    max_tokens=1024,
    temperature=0,
)
print(response.choices[0].message.content)

Image Example

python
import base64
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

with open("./demo/demo.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="hulu",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}},
            {"type": "text", "text": "Generate a medical report for this image."},
        ],
    }],
    max_tokens=1024,
    temperature=0,
)
print(response.choices[0].message.content)

📋 Supported Tasks

✅ Visual Question Answering (2D/3D/Video)
✅ Medical Report Generation
✅ Disease Diagnosis
✅ Anatomical Understanding
✅ Surgical Phase Recognition
✅ Clinical Dialogue
✅ Medical Text Reasoning
✅ Multilingual Medical QA
✅ Rare Disease Diagnosis
✅ And more

📄 Citation

If you find Hulu-Med useful in your research, please cite:

bibtex
@misc{jiang2025hulumedtransparentgeneralistmodel,
      title={Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding}, 
      author={Songtao Jiang and Yuan Wang and Sibo Song and Tianxiang Hu and Chenyi Zhou and Bin Pu and Yan Zhang and Zhibo Yang and Yang Feng and Joey Tianyi Zhou and Jin Hao and Zijian Chen and Ruijia Wu and Tao Tang and Junhui Lv and Hongxia Xu and Hongwei Wang and Jun Xiao and Bin Feng and Fudong Zhu and Kenli Li and Weidi Xie and Jimeng Sun and Jian Wu and Zuozhu Liu},
      year={2025},
      eprint={2510.08668},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.08668}, 
}

📜 License

This project is released under the Apache 2.0 License.

Hulu-Med-30A3

Get help setting up a custom Dedicated Endpoints.

README

🔥 News

📖 Overview

Key Features

Comprehensive Data Coverage

💻 Quick Start

Start the Server

vLLM

SGLang

Inference via OpenAI-Compatible API

Text Example

Image Example

📋 Supported Tasks

📄 Citation

📜 License

Explore FriendliAI today

README

🔥 News

📖 Overview

Key Features

Comprehensive Data Coverage

💻 Quick Start

Start the Server

vLLM

SGLang

Inference via OpenAI-Compatible API

Text Example

Image Example

📋 Supported Tasks

📄 Citation

📜 License