Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Jeethu/Huihui-gemma-4-E2B-it-abliterated-v2-PARO
Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
ParoQuant is the state-of-the-art INT4 quantization for LLMs. It closes the accuracy gap with FP16 while running at near-AWQ speed. Supports NVIDIA GPUs (vLLM, Transformers) and Apple Silicon (MLX). For more information, see https://github.com/z-lab/paroquant.
Jeethu/Huihui-gemma-4-E2B-it-abliterated-v2-PARO is a 4-bit huihui-ai/Huihui-gemma-4-E2B-it-abliterated-v2 quantized with ParoQuant.
Model provider
Jeethu
Model tree
Base
huihui-ai/Huihui-gemma-4-E2B-it-abliterated-v2
Quantized
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information