usermma

Nex-N2-mini

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Open Source

In keeping with our commitment to open source, we are releasing both Nex-N2-Pro and Nex-N2-mini as open-source models starting today.

Nex-N2-Pro: Hugging Face | ModelScope
Nex-N2-mini: Hugging Face | ModelScope
Early Access: SiliconFlow

We welcome developers and enterprises to integrate and try Nex-N2 and share their feedback.

Performance

We evaluate Nex-N2 in real agentic workflows along three directions — agentic tasks, coding tasks, and general tasks — covering benchmarks across tool calling, search-based decision-making, software engineering, and terminal execution. Nex-N2-Pro delivers strong performance that keeps pace with top-tier models such as GPT-5.5 and Opus 4.7: it excels at coding (e.g., 75.3 on Terminal-Bench 2.1) and long-horizon tasks (1585 on GDPval), and shows especially strong generalization and competitiveness on newer benchmarks like SWE-Atlas and DeepSWE. On general capability and core reasoning, it stands on par with leading frontier models.

Nex-N2 Benchmark Overview

Nex-N2 ships in two variants, both post-trained on the Qwen3.5 series: Nex-N2-Pro (built on Qwen3.5-397B-A17B) and Nex-N2-mini (built on Qwen3.5-35B-A3B-Base), covering different latency and quality trade-offs. The table below reports their scores alongside leading proprietary and open models across our full evaluation suite.

Table with columns: Benchmark, Nex-N2-mini, Nex-N2-Pro, GPT-5.5, Opus 4.7, Kimi-K2.6, GLM-5.1, MiniMax M3, DeepSeek-V4-Pro
Benchmark	Nex-N2-mini	Nex-N2-Pro	GPT-5.5	Opus 4.7	Kimi-K2.6	GLM-5.1	MiniMax M3	DeepSeek-V4-Pro
Agent
BrowseComp	74.1

Usage

Local Deployment

Note: For the best performance with Nex-series models, we recommend serving them with our customized sglang fork.

First, install our sglang fork:

bash
# Use the customized `sglang` fork
git clone https://github.com/nex-agi/sglang.git
cd sglang

# Install the python packages
pip install --upgrade pip
pip install -e "python"

Nex-N2-Pro

Launch the server (example on two 8× H100 servers with CUDA 13.0):

bash
# Multi-node (2 nodes). Run the same command on every node with:
#   <node-rank> = 0 on the head node, 1 on the other node
#   <node0-ip>  = IP of the head node (reachable from all others)
python -m sglang.launch_server \
  --model-path /path/to/your/model  \
  --tp 16 \
  --nnodes 2 \
  --node-rank <node-rank> \
  --dist-init-addr <node0-ip>:20000 \
  --reasoning-parser qwen3 \
  --tool-call-parser qwen3_coder \
  --mamba-scheduler-strategy extra_buffer

Nex-N2-mini

Launch the server (example on one 2× H100 server with CUDA 13.0):

bash
python -m sglang.launch_server \
  --model-path /path/to/your/model  \
  --tp 2 \
  --reasoning-parser qwen3 \
  --tool-call-parser qwen3_coder \
  --mamba-scheduler-strategy extra_buffer

Docker Deployment

We also provide a prebuilt Docker image with our customized sglang fork preinstalled: nexagi/sglang:v0.5.12. The launch command is the same as above.

Nex-N2-Pro

bash
# Multi-node (2 nodes). Run the same command on every node with:
#   <node-rank> = 0 on the head node, 1 on the other node
#   <node0-ip>  = IP of the head node (reachable from all others)
docker run --gpus all --shm-size 32g --network host \
  -v /path/to/your/model:/model \
  nexagi/sglang:v0.5.12 \
  python3 -m sglang.launch_server \
    --model-path /model \
    --tp 16 \
    --nnodes 2 \
    --node-rank <node-rank> \
    --dist-init-addr <node0-ip>:20000 \
    --host 0.0.0.0 --port 30000 \
    --reasoning-parser qwen3 \
    --tool-call-parser qwen3_coder \
    --mamba-scheduler-strategy extra_buffer

Nex-N2-mini

Single node with 2× H100:

bash
docker run --gpus all --shm-size 32g --ipc=host \
  -p 30000:30000 \
  -v /path/to/your/model:/model \
  nexagi/sglang:v0.5.12 \
  python3 -m sglang.launch_server \
    --model-path /model \
    --tp 2 \
    --host 0.0.0.0 --port 30000 \
    --reasoning-parser qwen3 \
    --tool-call-parser qwen3_coder \
    --mamba-scheduler-strategy extra_buffer

Recommended Sampling Parameters

For the best generation quality, we recommend the following sampling parameters:

temperature: 0.7
top_p: 0.95
top_k: 40

Function Calling

Nex-series models support robust function-calling capabilities. To enable function calling, add the --tool-call-parser qwen3_coder flag when launching the server:

bash
python -m sglang.launch_server --model-path /path/to/your/model --tool-call-parser qwen3_coder

Reasoning Parser

Nex-series models emit explicit reasoning traces. Add the --reasoning-parser qwen3 flag to parse the reasoning content separately from the final response. It can be combined with the function-calling parser above:

bash
python -m sglang.launch_server --model-path /path/to/your/model --tool-call-parser qwen3_coder --reasoning-parser qwen3

Model provider

usermma

Model tree

Base

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Open Source

In keeping with our commitment to open source, we are releasing both Nex-N2-Pro and Nex-N2-mini as open-source models starting today.

Nex-N2-Pro: Hugging Face | ModelScope
Nex-N2-mini: Hugging Face | ModelScope
Early Access: SiliconFlow

We welcome developers and enterprises to integrate and try Nex-N2 and share their feedback.

Performance

Nex-N2 Benchmark Overview

Table with columns: Benchmark, Nex-N2-mini, Nex-N2-Pro, GPT-5.5, Opus 4.7, Kimi-K2.6, GLM-5.1, MiniMax M3, DeepSeek-V4-Pro
Benchmark	Nex-N2-mini	Nex-N2-Pro	GPT-5.5	Opus 4.7	Kimi-K2.6	GLM-5.1	MiniMax M3	DeepSeek-V4-Pro
Agent
BrowseComp	74.1

Usage

Local Deployment

Note: For the best performance with Nex-series models, we recommend serving them with our customized sglang fork.

First, install our sglang fork:

bash
# Use the customized `sglang` fork
git clone https://github.com/nex-agi/sglang.git
cd sglang

# Install the python packages
pip install --upgrade pip
pip install -e "python"

Nex-N2-Pro

Launch the server (example on two 8× H100 servers with CUDA 13.0):

bash
# Multi-node (2 nodes). Run the same command on every node with:
#   <node-rank> = 0 on the head node, 1 on the other node
#   <node0-ip>  = IP of the head node (reachable from all others)
python -m sglang.launch_server \
  --model-path /path/to/your/model  \
  --tp 16 \
  --nnodes 2 \
  --node-rank <node-rank> \
  --dist-init-addr <node0-ip>:20000 \
  --reasoning-parser qwen3 \
  --tool-call-parser qwen3_coder \
  --mamba-scheduler-strategy extra_buffer

Nex-N2-mini

Launch the server (example on one 2× H100 server with CUDA 13.0):

bash
python -m sglang.launch_server \
  --model-path /path/to/your/model  \
  --tp 2 \
  --reasoning-parser qwen3 \
  --tool-call-parser qwen3_coder \
  --mamba-scheduler-strategy extra_buffer

Docker Deployment

We also provide a prebuilt Docker image with our customized sglang fork preinstalled: nexagi/sglang:v0.5.12. The launch command is the same as above.

Nex-N2-Pro

bash
# Multi-node (2 nodes). Run the same command on every node with:
#   <node-rank> = 0 on the head node, 1 on the other node
#   <node0-ip>  = IP of the head node (reachable from all others)
docker run --gpus all --shm-size 32g --network host \
  -v /path/to/your/model:/model \
  nexagi/sglang:v0.5.12 \
  python3 -m sglang.launch_server \
    --model-path /model \
    --tp 16 \
    --nnodes 2 \
    --node-rank <node-rank> \
    --dist-init-addr <node0-ip>:20000 \
    --host 0.0.0.0 --port 30000 \
    --reasoning-parser qwen3 \
    --tool-call-parser qwen3_coder \
    --mamba-scheduler-strategy extra_buffer

Nex-N2-mini

Single node with 2× H100:

bash
docker run --gpus all --shm-size 32g --ipc=host \
  -p 30000:30000 \
  -v /path/to/your/model:/model \
  nexagi/sglang:v0.5.12 \
  python3 -m sglang.launch_server \
    --model-path /model \
    --tp 2 \
    --host 0.0.0.0 --port 30000 \
    --reasoning-parser qwen3 \
    --tool-call-parser qwen3_coder \
    --mamba-scheduler-strategy extra_buffer

Recommended Sampling Parameters

For the best generation quality, we recommend the following sampling parameters:

temperature: 0.7
top_p: 0.95
top_k: 40

Function Calling

Nex-series models support robust function-calling capabilities. To enable function calling, add the --tool-call-parser qwen3_coder flag when launching the server:

bash
python -m sglang.launch_server --model-path /path/to/your/model --tool-call-parser qwen3_coder

Reasoning Parser

bash
python -m sglang.launch_server --model-path /path/to/your/model --tool-call-parser qwen3_coder --reasoning-parser qwen3

Nex-N2-mini

Get help setting up a custom Dedicated Endpoints.

README

Open Source

Performance

Usage

Local Deployment

Nex-N2-Pro

Nex-N2-mini

Docker Deployment

Nex-N2-Pro

Nex-N2-mini

Recommended Sampling Parameters

Function Calling

Reasoning Parser

Explore FriendliAI today

README

Open Source

Performance

Usage

Local Deployment

Nex-N2-Pro

Nex-N2-mini

Docker Deployment

Nex-N2-Pro

Nex-N2-mini

Recommended Sampling Parameters

Function Calling

Reasoning Parser