nvidia
Nemotron-Cascade-2-30B-A3B
Open 30B MoE model with 3B activated parameters delivering strong reasoning and agentic capabilities.
Introduction
We're excited to introduce Nemotron-Cascade-2-30B-A3B, an open 30B MoE model with 3B activated parameters that delivers strong reasoning and agentic capabilities. It is post-trained from the Nemotron-3-Nano-30B-A3B-Base. Nemotron-Cascade-2-30B-A3B achieves gold medal performance in both the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI). It operates in both thinking and instruct (non-thinking) modes.
Quick Start
- Nemotron-Cascade-2-30B-A3B follows the ChatML template and supports both thinking and instruct (non-thinking) modes. Reasoning content is enclosed within
<think>and</think>tags. To activate the instruct (non-thinking) mode, They prepend<think></think>to the beginning of the assistant’s response. - Nemotron-Cascade-2-30B-A3B does not currently support OpenCode; it primarily supports OpenHands for agentic coding and SWE tasks.
- To reduce the context length in a multi-turn conversation, when the previous user turn involves thinking mode, only the final summary of the model's output will be added to the conversation history.
- Note that we do not define a separate
toolrole for tool responses; instead, we place them under theuserrole and wrap them with<tool_response>and</tool_response>. - We recommend setting the sampling parameters to temperature = 1.0 and top_p = 0.95.
License
Your use of this model is governed by the NVIDIA Open Model License.
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
Model provider
nvidia
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Serverless Endpoints
Dedicated Endpoints
Container
More information