Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0[!Note] This repository contains a FP8 dynamic version of
gemma-4-12B-it, quantized to run on <=16GB VRAM hardware. Given the quantization method, it requires Ada-based or later architectures.
Model provider
edornd
Model tree
Base
google/gemma-4-12B-it
Quantized
this model
Modalities
Input
Video, Audio, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information