Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Jeethu/Qwen3.5-2B-PARO
Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
ParoQuant is the state-of-the-art INT4 quantization for LLMs. It closes the accuracy gap with FP16 while running at near-AWQ speed. Supports NVIDIA GPUs (vLLM, Transformers) and Apple Silicon (MLX). For more information, see https://github.com/z-lab/paroquant.
Jeethu/Qwen3.5-2B-PARO is a 4-bit Qwen/Qwen3.5-2B quantized with ParoQuant.
Model provider
Jeethu
Model tree
Base
Qwen/Qwen3.5-2B
Quantized
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information