Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Open Source

In keeping with our commitment to open source, we are releasing both Nex-N2-Pro and Nex-N2-mini as open-source models starting today.

We welcome developers and enterprises to integrate and try Nex-N2 and share their feedback.

Performance

We evaluate Nex-N2 in real agentic workflows along three directions — agentic tasks, coding tasks, and general tasks — covering benchmarks across tool calling, search-based decision-making, software engineering, and terminal execution. Nex-N2-Pro delivers strong performance that keeps pace with top-tier models such as GPT-5.5 and Opus 4.7: it excels at coding (e.g., 75.3 on Terminal-Bench 2.1) and long-horizon tasks (1585 on GDPval), and shows especially strong generalization and competitiveness on newer benchmarks like SWE-Atlas and DeepSWE. On general capability and core reasoning, it stands on par with leading frontier models.

Nex-N2 Benchmark Overview

Nex-N2 ships in two variants, both post-trained on the Qwen3.5 series: Nex-N2-Pro (built on Qwen3.5-397B-A17B) and Nex-N2-mini (built on Qwen3.5-35B-A3B-Base), covering different latency and quality trade-offs. The table below reports their scores alongside leading proprietary and open models across our full evaluation suite.

BenchmarkNex-N2-miniNex-N2-ProGPT-5.5Opus 4.7Kimi-K2.6GLM-5.1MiniMax M3DeepSeek-V4-Pro
Agent
BrowseComp74.183.784.479.883.279.383.583.4
GDPval140215851769175314811535-1554
Toolathlon33.351.955.652.850.040.7-51.8
WildClawBench47.753.558.262.2-48.2-43.7
WideSearch62.075.6--80.8---
TAU365.971.1---70.6--
Coding & SWE
SWE-Bench Pro50.258.858.664.358.658.459.055.4
Terminal-Bench 2.160.775.383.469.7-58.766.072.0
DeepSWE8.033.670542418-8
SWE-Bench Verified74.480.882.987.680.2-80.580.6
SWE Atlas QnA31.537.945.445.2--37.9-
SWE Atlas RF30.032.944.848.6----
SWE Atlas TW23.340.042.638.2--30.8-
General & Reasoning
GPQA Diamond82.690.793.694.290.586.2-90.1
IFEval89.194.0--94.594.5-91.9
Apex9.436.5--24.011.5-38.3

Usage

Local Deployment

Note: For the best performance with Nex-series models, we recommend serving them with our customized sglang fork.

First, install our sglang fork:

bash

# Use the customized `sglang` fork
git clone https://github.com/nex-agi/sglang.git
cd sglang
# Install the python packages
pip install --upgrade pip
pip install -e "python"

Nex-N2-Pro

Launch the server (example on two 8Ɨ H100 servers with CUDA 13.0):

bash

# Multi-node (2 nodes). Run the same command on every node with:
# <node-rank> = 0 on the head node, 1 on the other node
# <node0-ip> = IP of the head node (reachable from all others)
python -m sglang.launch_server \
--model-path /path/to/your/model \
--tp 16 \
--nnodes 2 \
--node-rank <node-rank> \
--dist-init-addr <node0-ip>:20000 \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--mamba-scheduler-strategy extra_buffer

Nex-N2-mini

Launch the server (example on one 2Ɨ H100 server with CUDA 13.0):

bash

python -m sglang.launch_server \
--model-path /path/to/your/model \
--tp 2 \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--mamba-scheduler-strategy extra_buffer

Docker Deployment

We also provide a prebuilt Docker image with our customized sglang fork preinstalled: nexagi/sglang:v0.5.12. The launch command is the same as above.

Nex-N2-Pro

bash

# Multi-node (2 nodes). Run the same command on every node with:
# <node-rank> = 0 on the head node, 1 on the other node
# <node0-ip> = IP of the head node (reachable from all others)
docker run --gpus all --shm-size 32g --network host \
-v /path/to/your/model:/model \
nexagi/sglang:v0.5.12 \
python3 -m sglang.launch_server \
--model-path /model \
--tp 16 \
--nnodes 2 \
--node-rank <node-rank> \
--dist-init-addr <node0-ip>:20000 \
--host 0.0.0.0 --port 30000 \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--mamba-scheduler-strategy extra_buffer

Nex-N2-mini

Single node with 2Ɨ H100:

bash

docker run --gpus all --shm-size 32g --ipc=host \
-p 30000:30000 \
-v /path/to/your/model:/model \
nexagi/sglang:v0.5.12 \
python3 -m sglang.launch_server \
--model-path /model \
--tp 2 \
--host 0.0.0.0 --port 30000 \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--mamba-scheduler-strategy extra_buffer

Recommended Sampling Parameters

For the best generation quality, we recommend the following sampling parameters:

  • temperature: 0.7
  • top_p: 0.95
  • top_k: 40

Function Calling

Nex-series models support robust function-calling capabilities. To enable function calling, add the --tool-call-parser qwen3_coder flag when launching the server:

bash

python -m sglang.launch_server --model-path /path/to/your/model --tool-call-parser qwen3_coder

Reasoning Parser

Nex-series models emit explicit reasoning traces. Add the --reasoning-parser qwen3 flag to parse the reasoning content separately from the final response. It can be combined with the function-calling parser above:

bash

python -m sglang.launch_server --model-path /path/to/your/model --tool-call-parser qwen3_coder --reasoning-parser qwen3

Model provider

nex-agi

Model tree

Base

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today