google

gemma-4-31B-it

Gemma 4 31B IT is a 30.7B dense multimodal model from Google DeepMind supporting text and image input. It offers a 256K context window, configurable thinking mode, native function calling, and multilingual coverage across 140+ languages.

API Example

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("API_KEY"),
    base_url="https://api.friendli.ai/serverless/v1",
)

completion = client.chat.completions.create(
    model="google/gemma-4-31B-it",
    extra_body={},
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a funny joke."},
    ],
)

print(completion.choices[0].message.content)

Model APIs

Run this model inference with a simple API call.

Learn more

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Models Overview

Gemma 4 models are designed to deliver frontier-level performance at each size, targeting deployment scenarios from mobile and edge devices (E2B, E4B) to consumer GPUs and workstations (26B A4B, 31B). They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding.

The models employ a hybrid attention mechanism that interleaves local sliding window attention with full global attention, ensuring the final layer is always global. This hybrid design delivers the processing speed and low memory footprint of a lightweight model without sacrificing the deep awareness required for complex, long-context tasks. To optimize memory for long contexts, global layers feature unified Keys and Values, and apply Proportional RoPE (p-RoPE).

Dense Models

Table with columns: Property, E2B, E4B, 31B Dense
Property	E2B	E4B	31B Dense
Total Parameters	2.3B effective (5.1B with embeddings)	4.5B effective (8B with embeddings)	30.7B
Layers	35	42

Model provider

google

Model tree

Base

google/gemma-4-31B

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Model APIs

Input

$0.14 / 1M tokens

Output

$0.4 / 1M tokens

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Models Overview

Dense Models

Table with columns: Property, E2B, E4B, 31B Dense
Property	E2B	E4B	31B Dense
Total Parameters	2.3B effective (5.1B with embeddings)	4.5B effective (8B with embeddings)	30.7B
Layers	35	42

Property	26B A4B MoE
Total Parameters	25.2B
Active Parameters	3.8B
Layers	30
Sliding Window	1024 tokens
Context Length	256K tokens
Vocabulary Size	262K
Expert Count	8 active / 128 total and 1 shared
Supported Modalities	Text, Image
Vision Encoder Parameters	~550M

	Gemma 4 31B	Gemma 4 26B A4B	Gemma 4 E4B	Gemma 4 E2B	Gemma 3 27B (no think)
MMLU Pro	85.2%	82.6%	69.4%	60.0%	67.6%
AIME 2026 no tools	89.2%	88.3%	42.5%	37.5%	20.8%
LiveCodeBench v6	80.0%	77.1%	52.0%	44.0%	29.1%
Codeforces ELO	2150	1718	940	633	110
GPQA Diamond	84.3%	82.3%	58.6%	43.4%	42.4%
Tau2 (average over 3)	76.9%	68.2%	42.2%	24.5%	16.2%
HLE no tools	19.5%	8.7%	-	-	-
HLE with search	26.5%	17.2%	-	-	-
BigBench Extra Hard	74.4%	64.8%	33.1%	21.9%	19.3%
MMMLU	88.4%	86.3%	76.6%	67.4%	70.7%
Vision
MMMU Pro	76.9%	73.8%	52.6%	44.2%	49.7%
OmniDocBench 1.5 (average edit distance, lower is better)	0.131	0.149	0.181	0.290	0.365
MATH-Vision	85.6%	82.4%	59.5%	52.4%	46.0%
MedXPertQA MM	61.3%	58.1%	28.7%	23.5%	-
Audio
CoVoST	-	-	35.54	33.47	-
FLEURS (lower is better)	-	-	0.08	0.09	-
Long Context
MRCR v2 8 needle 128k (average)	66.4%	44.1%	25.4%	19.1%	13.5%

gemma-4-31B-it

Get help setting up a custom Dedicated Endpoints.

README

Models Overview

Dense Models

Explore FriendliAI today

README

Models Overview

Dense Models

Mixture-of-Experts (MoE) Model

Benchmark Results

Core Capabilities

Getting Started

Best Practices

1. Sampling Parameters

2. Thinking Mode Configuration

3. Multi-Turn Conversations

4. Modality order

5. Variable Image Resolution

6. Audio

7. Audio and Video Length

Model Data

Training Dataset

Data Preprocessing

Ethics and Safety

Evaluation Approach

Evaluation Results

Usage and Limitations

Intended Usage

Limitations

Ethical Considerations and Risks

Benefits

Citation