Jeethu

Qwen3.5-2B-PARO

README

License: apache-2.0

Jeethu/Qwen3.5-2B-PARO

Pairwise Rotation Quantization for Efficient Reasoning LLM Inference

ParoQuant is the state-of-the-art INT4 quantization for LLMs. It closes the accuracy gap with FP16 while running at near-AWQ speed. Supports NVIDIA GPUs (vLLM, Transformers) and Apple Silicon (MLX). For more information, see https://github.com/z-lab/paroquant.

Jeethu/Qwen3.5-2B-PARO is a 4-bit Qwen/Qwen3.5-2B quantized with ParoQuant.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

Jeethu

Model Tree

Base

Qwen/Qwen3.5-2B

Quantized