• March 11, 2026
  • 3 min read

Nemotron 3 Super is Live on FriendliAI: multi-agent applications and for specialized agentic AI systems

TL;DR
  • FriendliAI has launched Day-0 support for NVIDIA’s Nemotron-3 Super, offering optimized Dedicated Endpoints for production-scale AI.
  • The model’s hybrid Transformer–Mamba MoE architecture delivers high-speed reasoning and massive token throughput with enterprise-grade efficiency.
  • It is purpose-built for multi-agent systems and complex tool-calling, supporting an expansive 1M-token context window.
Nemotron 3 Super is Live on FriendliAI: multi-agent applications and for specialized agentic AI systems  thumbnail

FriendliAI is launching Day-0 support for NVIDIA Nemotron 3 Super. Our Dedicated Endpoints are optimized to unlock the full efficiency of Nemotron’s hybrid MoE architecture—delivering high-throughput inference, low-latency execution, and enterprise-ready deployment for multi-agent and tool-using AI systems. From evaluation to production, FriendliAI provides the performance infrastructure required to scale Nemotron 3 Super in real-world agentic workloads.

NVIDIA Nemotron 3 Super is NVIDIA’s next-generation open reasoning model, purpose-built for multi-agent systems and specialized agentic AI. Powered by a hybrid Transformer–Mamba MoE architecture, it combines high accuracy with exceptional compute efficiency, unlocking tool-using agents and long-context workflows at production scale.

Provision your endpoint here.

NVIDIA Nemotron 3 Super Provides:

Highest Efficiency • Hybrid Transformer–Mamba MoE architecture enables faster token generation, allowing the model to reason more efficiently and deliver higher-quality responses.

• MoE routing reduces active compute and meets real-world latency requirements
Leading Accuracy• Trained with NVIDIA-curated high-quality synthetic reasoning data and aligned with reinforcement learning

• Delivers strong performance across advanced reasoning benchmarks including GPQA Diamond, AIME 2025, LiveCodeBench, IFBench, and BFCL
Open by Design• Open weights under NVIDIA’s open-model license

• Open datasets for transparency

• Open recipes and post-training techniques for customization → Enterprises retain full data control and deployment flexibility
Run Anywhere• Available across leading inference platforms and packaged as NVIDIA NIM

• Runs from laptop to cloud

• Productionized on FriendliAI via Dedicated Endpoints for scalable agentic workloads

What can you build with Nemotron 3 Super on FriendliAI?

Nemotron 3 Super is specifically engineered for agentic intelligence, multi-agent coordination, tool interactions, and long-context reasoning, and FriendliAI’s high-performance hosting unlocks these capabilities at production scale. With Nemotron 3 Super on FriendliAI, developers can build applications that go beyond static generation and into dynamic, actionable AI workflows.

Here are a few powerful categories of next-generation applications you can build:

Use CaseDescriptionKey Capabilities
Multi-Agent SystemsBuild collaborative AI systems where multiple agents reason, coordinate, and execute complex workflows within a single application.1. Optimized for running many collaborating agents per GPU (120B model with 12B active parameters)

2. Excels at structured reasoning and multi-step instruction following

3. Handles long-horizon task decomposition and coordinated execution

4. Efficient MoE routing minimizes compute while maintaining high reasoning depth
Tool-Calling & Agentic WorkflowsCreate AI applications that interact with tools such as search, databases, APIs, and internal services to automate end-to-end tasks.1. Designed for tool-aware reasoning and complex orchestration

2. “Thinking budget” token control optimizes cost for multi-step workflows

3. Stable performance across chained tool execution scenarios
Enterprise RAG & Long-Context ReasoningDeploy retrieval-augmented generation pipelines for document analysis, compliance workflows, and knowledge systems.1. Supports up to 1M token context for deep document reasoning

2. Trained with high-quality synthetic reasoning data aligned to human-like logic

3. Maintains strong accuracy across reasoning-heavy benchmarks
Security, Finance & Operational IntelligenceBuild domain-specific agents for fraud detection, vulnerability analysis, and operational optimization.1. Performs structured financial data extraction and analysis

2. Excels at vulnerability triage and multi-step security reasoning

3. Optimized for latency-sensitive enterprise decision systems

4. Open weights and deployment flexibility for secure, private infrastructure

Friendli Dedicated Endpoints for NVIDIA Nemotron 3 Super

FriendliAI enables production deployment of NVIDIA Nemotron 3 Super with dedicated GPUs, predictable performance, and 99.99% uptime SLA, so teams can scale agentic workloads without managing infrastructure.

Built for enterprise AI systems, Friendli Dedicated Endpoints support multi-agent workflows, tool calling, and long-context reasoning with optimized MoE inference and reserved GPU capacity.

Key Capabilities

  • Private Nemotron 3 Super deployment on dedicated GPUs for consistent performance
  • High-throughput, low-latency MoE-optimized inference to maximize token throughput
  • Production-ready endpoints with built-in observability and autoscaling
  • Flexible GPU options (B200, H100/H200) based on performance needs
  • Enterprise-grade reliability, security, and full data ownership in a private environment

FriendliAI handles serving, batching, scaling, and GPU orchestration, letting your team focus on building, not infrastructure.

With Friendli Dedicated Endpoints, teams can:

  • Deploy Nemotron 3 Super into production within minutes
  • Scale multi-agent systems without rebuilding infrastructure
  • Run long-context reasoning pipelines with predictable latency
  • Effortless autoscaling: scale inference dynamically across GPUs, instantly right-sizing capacity to match demand.
  • Support tool-calling workflows at enterprise scale
  • Operate customized Nemotron models securely and privately

FriendliAI removes the operational complexity of serving frontier models — enabling developers to focus on agentic intelligence, not infrastructure.

Get started with Nemotron 3 Super on FriendliAI

To deploy NVIDIA Nemotron 3 Super on Friendli Dedicated Endpoints,

1️⃣ Navigate to the dedicated endpoint creation page.

2️⃣ Choose your desired model, such as "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16".

3️⃣ Click “Create”

You can send requests to Nemotron 3 Super using any OpenAI-compatible inference API/SDK. For example, using Friendli Python SDK:

Prerequisites

Before getting started, you need to set up:

  • A FriendliAI account
  • Friendli Token from Friendli Suite settings tab

Install the package

shell
uv pip install friendli

Environment Setup

Set up your FriendliAI API key (aka Friendli Token):

shell
export FRIENDLI_TOKEN="your-token-here"

Example Code

python
import os

from friendli import SyncFriendli

with SyncFriendli(
    token=os.environ["FRIENDLI_TOKEN"],
) as friendli:
    res = friendli.dedicated.chat.stream(
        model="your-dedicated-endpoint-id",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello, Nemotron 3 Super!"},
        ],
    )
    for chunk in res:
        if content := chunk.data.choices[0].delta.content:
            print(content, end="")

Run better with FriendliAI

Nemotron 3 Super is redefining what’s possible with open reasoning models and FriendliAI is your platform to take it to production. From experimentation to enterprise deployment, FriendliAI delivers the performance, reliability, and control required to scale agentic AI in real-world applications.

👉 Launch your Nemotron 3 Super Dedicated Endpoint today and start building next-generation agentic AI with FriendliAI.


Written by

FriendliAI Tech & Research


Share


General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 520,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.


Explore FriendliAI today