mochi0314

qwen2vl-2b-cond4-pavr-fused

README

License: apache-2.0

bash
pip install -U mlx-vlm

bash
python -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --temp 0.0

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Container

Run this model inference with full control and performance in your environment.

Model Details

Model Provider

mochi0314

Model Tree

Base

this model

Input Modalities

TextImage

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer