Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Open Source

In keeping with our commitment to open source, we are releasing both Nex-N2-Pro and Nex-N2-mini as open-source models starting today.

We welcome developers and enterprises to integrate and try Nex-N2 and share their feedback.

Performance

We evaluate Nex-N2 in real agentic workflows along three directions — agentic tasks, coding tasks, and general tasks — covering benchmarks across tool calling, search-based decision-making, software engineering, and terminal execution. Nex-N2-Pro delivers strong performance that keeps pace with top-tier models such as GPT-5.5 and Opus 4.7: it excels at coding (e.g., 75.3 on Terminal-Bench 2.1) and long-horizon tasks (1585 on GDPval), and shows especially strong generalization and competitiveness on newer benchmarks like SWE-Atlas and DeepSWE. On general capability and core reasoning, it stands on par with leading frontier models.

Nex-N2 Benchmark Overview

Nex-N2 ships in two variants, both post-trained on the Qwen3.5 series: Nex-N2-Pro (built on Qwen3.5-397B-A17B) and Nex-N2-mini (built on Qwen3.5-35B-A3B-Base), covering different latency and quality trade-offs. The table below reports their scores alongside leading proprietary and open models across our full evaluation suite.

BenchmarkNex-N2-miniNex-N2-ProGPT-5.5Opus 4.7Kimi-K2.6GLM-5.1MiniMax M3DeepSeek-V4-Pro
Agent
BrowseComp74.183.784.479.883.279.383.583.4
GDPval140215851769175314811535-1554
Toolathlon33.351.955.652.850.040.7-51.8
WildClawBench47.753.558.262.2-48.2-43.7
WideSearch62.075.6--80.8---
TAU365.971.1---70.6--
Coding & SWE
SWE-Bench Pro50.258.858.664.358.658.459.055.4
Terminal-Bench 2.160.775.383.469.7-58.766.072.0
DeepSWE8.033.670542418-8
SWE-Bench Verified74.480.882.987.680.2-80.580.6
SWE Atlas QnA31.537.945.445.2--37.9-
SWE Atlas RF30.032.944.848.6----
SWE Atlas TW23.340.042.638.2--30.8-
General & Reasoning
GPQA Diamond82.690.793.694.290.586.2-90.1
IFEval89.194.0--94.594.5-91.9
Apex9.436.5--24.011.5-38.3

Usage

Local Deployment

Note: For the best performance with Nex-series models, we recommend serving them with our customized sglang fork.

First, install our sglang fork:

bash

# Use the customized `sglang` fork
git clone https://github.com/nex-agi/sglang.git
cd sglang
# Install the python packages
pip install --upgrade pip
pip install -e "python"

Nex-N2-Pro

Launch the server (example on two 8Ɨ H100 servers with CUDA 13.0):

bash

# Multi-node (2 nodes). Run the same command on every node with:
# <node-rank> = 0 on the head node, 1 on the other node
# <node0-ip> = IP of the head node (reachable from all others)
python -m sglang.launch_server \
--model-path /path/to/your/model \
--tp 16 \
--nnodes 2 \
--node-rank <node-rank> \
--dist-init-addr <node0-ip>:20000 \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--mamba-scheduler-strategy extra_buffer

Nex-N2-mini

Launch the server (example on one 2Ɨ H100 server with CUDA 13.0):

bash

python -m sglang.launch_server \
--model-path /path/to/your/model \
--tp 2 \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--mamba-scheduler-strategy extra_buffer

Docker Deployment

We also provide a prebuilt Docker image with our customized sglang fork preinstalled: nexagi/sglang:v0.5.12. The launch command is the same as above.

Nex-N2-Pro

bash

# Multi-node (2 nodes). Run the same command on every node with:
# <node-rank> = 0 on the head node, 1 on the other node
# <node0-ip> = IP of the head node (reachable from all others)
docker run --gpus all --shm-size 32g --network host \
-v /path/to/your/model:/model \
nexagi/sglang:v0.5.12 \
python3 -m sglang.launch_server \
--model-path /model \
--tp 16 \
--nnodes 2 \
--node-rank <node-rank> \
--dist-init-addr <node0-ip>:20000 \
--host 0.0.0.0 --port 30000 \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--mamba-scheduler-strategy extra_buffer

Nex-N2-mini

Single node with 2Ɨ H100:

bash

docker run --gpus all --shm-size 32g --ipc=host \
-p 30000:30000 \
-v /path/to/your/model:/model \
nexagi/sglang:v0.5.12 \
python3 -m sglang.launch_server \
--model-path /model \
--tp 2 \
--host 0.0.0.0 --port 30000 \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--mamba-scheduler-strategy extra_buffer

Recommended Sampling Parameters

For the best generation quality, we recommend the following sampling parameters:

  • temperature: 0.7
  • top_p: 0.95
  • top_k: 40

Function Calling

Nex-series models support robust function-calling capabilities. To enable function calling, add the --tool-call-parser qwen3_coder flag when launching the server:

bash

python -m sglang.launch_server --model-path /path/to/your/model --tool-call-parser qwen3_coder

Reasoning Parser

Nex-series models emit explicit reasoning traces. Add the --reasoning-parser qwen3 flag to parse the reasoning content separately from the final response. It can be combined with the function-calling parser above:

bash

python -m sglang.launch_server --model-path /path/to/your/model --tool-call-parser qwen3_coder --reasoning-parser qwen3

Model provider

usermma

Model tree

Base

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today