jkim96
gemma-4-26B-A4B-it-DASHQ-INT4-g64
Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Install
bash
pip install git+https://github.com/JaeminK/dashq.git
Load
python
from dashq import load_quantizedmodel, tokenizer = load_quantized("jkim96/gemma-4-26B-A4B-it-DASHQ-INT4-g64", device_map="auto")
Quantization
| Field | Value |
|---|---|
| Base model | google/gemma-4-26B-A4B-it |
| Precision | INT4, group size 64 |
| Scale / zero dtype | float16 |
| Calibration | wikitext2, 128 samples x 2048 |
| Size | 16.4048 GB · original 51.6120 GB · 3.1x compression |
Benchmarks
Full zero-shot / few-shot results for every DASH-Q checkpoint: github.com/JaeminK/dashq#benchmarks
Model provider
jkim96
Model tree
Base
google/gemma-4-26B-A4B-it
Quantized
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information