Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Details
| Property | Value |
|---|---|
| Base model | Qwen3.5-0.8B |
| Type | Vision-Language Model (VLM) |
| Format | MLX fp16 (bfloat16) |
| Size | ~1.75 GB |
| Abliterated | Yes — censorship layers removed |
Variants
| Variant | Size | Quality | Link |
|---|---|---|---|
| fp16 | ~1.75 GB | Highest | This repo |
| MXFP8 | ~0.98 GB | Near-native | mxfp8 |
| MXFP4 | ~0.6 GB | Good | mxfp4 |
Usage
bash
pip install mlx-vlm# Text generationpython -m mlx_vlm.generate \--model AITRADER/Huihui-Qwen3.5-0.8B-abliterated-fp16-MLX \--prompt "Describe this image in detail" \--image <path-or-url># Chat UIpython -m mlx_vlm.chat_ui \--model AITRADER/Huihui-Qwen3.5-0.8B-abliterated-fp16-MLX
Credits
Model provider
monyschuk
Model tree
Base
Qwen/Qwen3.5-0.8B
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information