- March 11, 2026
- 3 min read
Nemotron 3 Super is Live on FriendliAI: multi-agent applications and for specialized agentic AI systems
- FriendliAI has launched Day-0 support for NVIDIA’s Nemotron-3 Super, offering optimized Dedicated Endpoints for production-scale AI.
- The model’s hybrid Transformer–Mamba MoE architecture delivers high-speed reasoning and massive token throughput with enterprise-grade efficiency.
- It is purpose-built for multi-agent systems and complex tool-calling, supporting an expansive 1M-token context window.

FriendliAI is launching Day-0 support for NVIDIA Nemotron 3 Super. Our Dedicated Endpoints are optimized to unlock the full efficiency of Nemotron’s hybrid MoE architecture—delivering high-throughput inference, low-latency execution, and enterprise-ready deployment for multi-agent and tool-using AI systems. From evaluation to production, FriendliAI provides the performance infrastructure required to scale Nemotron 3 Super in real-world agentic workloads.
NVIDIA Nemotron 3 Super is NVIDIA’s next-generation open reasoning model, purpose-built for multi-agent systems and specialized agentic AI. Powered by a hybrid Transformer–Mamba MoE architecture, it combines high accuracy with exceptional compute efficiency, unlocking tool-using agents and long-context workflows at production scale.
Provision your endpoint here.
NVIDIA Nemotron 3 Super Provides:
| Highest Efficiency | • Hybrid Transformer–Mamba MoE architecture enables faster token generation, allowing the model to reason more efficiently and deliver higher-quality responses.
• MoE routing reduces active compute and meets real-world latency requirements |
| Leading Accuracy | • Trained with NVIDIA-curated high-quality synthetic reasoning data and aligned with reinforcement learning
• Delivers strong performance across advanced reasoning benchmarks including GPQA Diamond, AIME 2025, LiveCodeBench, IFBench, and BFCL |
| Open by Design | • Open weights under NVIDIA’s open-model license
• Open datasets for transparency • Open recipes and post-training techniques for customization → Enterprises retain full data control and deployment flexibility |
| Run Anywhere | • Available across leading inference platforms and packaged as NVIDIA NIM
• Runs from laptop to cloud • Productionized on FriendliAI via Dedicated Endpoints for scalable agentic workloads |
What can you build with Nemotron 3 Super on FriendliAI?
Nemotron 3 Super is specifically engineered for agentic intelligence, multi-agent coordination, tool interactions, and long-context reasoning, and FriendliAI’s high-performance hosting unlocks these capabilities at production scale. With Nemotron 3 Super on FriendliAI, developers can build applications that go beyond static generation and into dynamic, actionable AI workflows.
Here are a few powerful categories of next-generation applications you can build:
| Use Case | Description | Key Capabilities |
|---|---|---|
| Multi-Agent Systems | Build collaborative AI systems where multiple agents reason, coordinate, and execute complex workflows within a single application. | 1. Optimized for running many collaborating agents per GPU (120B model with 12B active parameters)
2. Excels at structured reasoning and multi-step instruction following 3. Handles long-horizon task decomposition and coordinated execution 4. Efficient MoE routing minimizes compute while maintaining high reasoning depth |
| Tool-Calling & Agentic Workflows | Create AI applications that interact with tools such as search, databases, APIs, and internal services to automate end-to-end tasks. | 1. Designed for tool-aware reasoning and complex orchestration
2. “Thinking budget” token control optimizes cost for multi-step workflows 3. Stable performance across chained tool execution scenarios |
| Enterprise RAG & Long-Context Reasoning | Deploy retrieval-augmented generation pipelines for document analysis, compliance workflows, and knowledge systems. | 1. Supports up to 1M token context for deep document reasoning
2. Trained with high-quality synthetic reasoning data aligned to human-like logic 3. Maintains strong accuracy across reasoning-heavy benchmarks |
| Security, Finance & Operational Intelligence | Build domain-specific agents for fraud detection, vulnerability analysis, and operational optimization. | 1. Performs structured financial data extraction and analysis
2. Excels at vulnerability triage and multi-step security reasoning 3. Optimized for latency-sensitive enterprise decision systems 4. Open weights and deployment flexibility for secure, private infrastructure |
Friendli Dedicated Endpoints for NVIDIA Nemotron 3 Super
FriendliAI enables production deployment of NVIDIA Nemotron 3 Super with dedicated GPUs, predictable performance, and 99.99% uptime SLA, so teams can scale agentic workloads without managing infrastructure.
Built for enterprise AI systems, Friendli Dedicated Endpoints support multi-agent workflows, tool calling, and long-context reasoning with optimized MoE inference and reserved GPU capacity.
Key Capabilities
- Private Nemotron 3 Super deployment on dedicated GPUs for consistent performance
- High-throughput, low-latency MoE-optimized inference to maximize token throughput
- Production-ready endpoints with built-in observability and autoscaling
- Flexible GPU options (B200, H100/H200) based on performance needs
- Enterprise-grade reliability, security, and full data ownership in a private environment
FriendliAI handles serving, batching, scaling, and GPU orchestration, letting your team focus on building, not infrastructure.
With Friendli Dedicated Endpoints, teams can:
- Deploy Nemotron 3 Super into production within minutes
- Scale multi-agent systems without rebuilding infrastructure
- Run long-context reasoning pipelines with predictable latency
- Effortless autoscaling: scale inference dynamically across GPUs, instantly right-sizing capacity to match demand.
- Support tool-calling workflows at enterprise scale
- Operate customized Nemotron models securely and privately
FriendliAI removes the operational complexity of serving frontier models — enabling developers to focus on agentic intelligence, not infrastructure.
Get started with Nemotron 3 Super on FriendliAI
To deploy NVIDIA Nemotron 3 Super on Friendli Dedicated Endpoints,
1️⃣ Navigate to the dedicated endpoint creation page.
2️⃣ Choose your desired model, such as "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16".
3️⃣ Click “Create”
You can send requests to Nemotron 3 Super using any OpenAI-compatible inference API/SDK. For example, using Friendli Python SDK:
Prerequisites
Before getting started, you need to set up:
- A FriendliAI account
- Friendli Token from Friendli Suite settings tab
Install the package
Environment Setup
Set up your FriendliAI API key (aka Friendli Token):
Example Code
Run better with FriendliAI
Nemotron 3 Super is redefining what’s possible with open reasoning models and FriendliAI is your platform to take it to production. From experimentation to enterprise deployment, FriendliAI delivers the performance, reliability, and control required to scale agentic AI in real-world applications.
👉 Launch your Nemotron 3 Super Dedicated Endpoint today and start building next-generation agentic AI with FriendliAI.
Written by
FriendliAI Tech & Research
Share
General FAQ
What is FriendliAI?
FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.
How does FriendliAI help my business?
Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing
Which models and modalities are supported?
Over 520,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models
Can I deploy models from Hugging Face directly?
Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership
Still have questions?
If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.

