Jeethu

MiniCPM5-1B-PARO

README

License: apache-2.0

Jeethu/MiniCPM5-1B-PARO

Pairwise Rotation Quantization for Efficient Reasoning LLM Inference

ParoQuant is the state-of-the-art INT4 quantization for LLMs. It closes the accuracy gap with FP16 while running at near-AWQ speed. Supports NVIDIA GPUs (vLLM, Transformers) and Apple Silicon (MLX). For more information, see https://github.com/z-lab/paroquant.

Jeethu/MiniCPM5-1B-PARO is a 4-bit openbmb/MiniCPM5-1B quantized with ParoQuant.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

Jeethu

Model Tree