jkim96
Qwen3.5-9B-DASHQ-INT4-g32
Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Install
bash
pip install git+https://github.com/JaeminK/dashq.git
Load
python
from dashq import load_quantizedmodel, tokenizer = load_quantized("jkim96/Qwen3.5-9B-DASHQ-INT4-g32", device_map="auto")
Quantization
| Field | Value |
|---|---|
| Base model | Qwen/Qwen3.5-9B |
| Precision | INT4, group size 32 |
| Scale / zero dtype | float16 |
| Calibration | wikitext2, 128 samples x 2048 |
| Size | 9.3068 GB · original 19.3063 GB · 2.1x compression |
Benchmarks
Full zero-shot / few-shot results for every DASH-Q checkpoint: github.com/JaeminK/dashq#benchmarks
Model provider
jkim96
Model tree
Base
Qwen/Qwen3.5-9B
Quantized
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information