Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Open Source
In keeping with our commitment to open source, we are releasing both Nex-N2-Pro and Nex-N2-mini as open-source models starting today.
- Nex-N2-Pro: Hugging Face | ModelScope
- Nex-N2-mini: Hugging Face | ModelScope
- Early Access: SiliconFlow
We welcome developers and enterprises to integrate and try Nex-N2 and share their feedback.
Performance
We evaluate Nex-N2 in real agentic workflows along three directions ā agentic tasks, coding tasks, and general tasks ā covering benchmarks across tool calling, search-based decision-making, software engineering, and terminal execution. Nex-N2-Pro delivers strong performance that keeps pace with top-tier models such as GPT-5.5 and Opus 4.7: it excels at coding (e.g., 75.3 on Terminal-Bench 2.1) and long-horizon tasks (1585 on GDPval), and shows especially strong generalization and competitiveness on newer benchmarks like SWE-Atlas and DeepSWE. On general capability and core reasoning, it stands on par with leading frontier models.

Nex-N2 ships in two variants, both post-trained on the Qwen3.5 series: Nex-N2-Pro (built on Qwen3.5-397B-A17B) and Nex-N2-mini (built on Qwen3.5-35B-A3B-Base), covering different latency and quality trade-offs. The table below reports their scores alongside leading proprietary and open models across our full evaluation suite.
| Benchmark | Nex-N2-mini | Nex-N2-Pro | GPT-5.5 | Opus 4.7 | Kimi-K2.6 | GLM-5.1 | MiniMax M3 | DeepSeek-V4-Pro |
|---|---|---|---|---|---|---|---|---|
| Agent | ||||||||
| BrowseComp | 74.1 | 83.7 | 84.4 | 79.8 | 83.2 | 79.3 | 83.5 | 83.4 |
| GDPval | 1402 | 1585 | 1769 | 1753 | 1481 | 1535 | - | 1554 |
| Toolathlon | 33.3 | 51.9 | 55.6 | 52.8 | 50.0 | 40.7 | - | 51.8 |
| WildClawBench | 47.7 | 53.5 | 58.2 | 62.2 | - | 48.2 | - | 43.7 |
| WideSearch | 62.0 | 75.6 | - | - | 80.8 | - | - | - |
| TAU3 | 65.9 | 71.1 | - | - | - | 70.6 | - | - |
| Coding & SWE | ||||||||
| SWE-Bench Pro | 50.2 | 58.8 | 58.6 | 64.3 | 58.6 | 58.4 | 59.0 | 55.4 |
| Terminal-Bench 2.1 | 60.7 | 75.3 | 83.4 | 69.7 | - | 58.7 | 66.0 | 72.0 |
| DeepSWE | 8.0 | 33.6 | 70 | 54 | 24 | 18 | - | 8 |
| SWE-Bench Verified | 74.4 | 80.8 | 82.9 | 87.6 | 80.2 | - | 80.5 | 80.6 |
| SWE Atlas QnA | 31.5 | 37.9 | 45.4 | 45.2 | - | - | 37.9 | - |
| SWE Atlas RF | 30.0 | 32.9 | 44.8 | 48.6 | - | - | - | - |
| SWE Atlas TW | 23.3 | 40.0 | 42.6 | 38.2 | - | - | 30.8 | - |
| General & Reasoning | ||||||||
| GPQA Diamond | 82.6 | 90.7 | 93.6 | 94.2 | 90.5 | 86.2 | - | 90.1 |
| IFEval | 89.1 | 94.0 | - | - | 94.5 | 94.5 | - | 91.9 |
| Apex | 9.4 | 36.5 | - | - | 24.0 | 11.5 | - | 38.3 |
Usage
Local Deployment
Note: For the best performance with Nex-series models, we recommend serving them with our customized
sglangfork.
First, install our sglang fork:
bash
# Use the customized `sglang` forkgit clone https://github.com/nex-agi/sglang.gitcd sglang# Install the python packagespip install --upgrade pippip install -e "python"
Nex-N2-Pro
Launch the server (example on two 8Ć H100 servers with CUDA 13.0):
bash
# Multi-node (2 nodes). Run the same command on every node with:# <node-rank> = 0 on the head node, 1 on the other node# <node0-ip> = IP of the head node (reachable from all others)python -m sglang.launch_server \--model-path /path/to/your/model \--tp 16 \--nnodes 2 \--node-rank <node-rank> \--dist-init-addr <node0-ip>:20000 \--reasoning-parser qwen3 \--tool-call-parser qwen3_coder \--mamba-scheduler-strategy extra_buffer
Nex-N2-mini
Launch the server (example on one 2Ć H100 server with CUDA 13.0):
bash
python -m sglang.launch_server \--model-path /path/to/your/model \--tp 2 \--reasoning-parser qwen3 \--tool-call-parser qwen3_coder \--mamba-scheduler-strategy extra_buffer
Docker Deployment
We also provide a prebuilt Docker image with our customized sglang fork preinstalled: nexagi/sglang:v0.5.12. The launch command is the same as above.
Nex-N2-Pro
bash
# Multi-node (2 nodes). Run the same command on every node with:# <node-rank> = 0 on the head node, 1 on the other node# <node0-ip> = IP of the head node (reachable from all others)docker run --gpus all --shm-size 32g --network host \-v /path/to/your/model:/model \nexagi/sglang:v0.5.12 \python3 -m sglang.launch_server \--model-path /model \--tp 16 \--nnodes 2 \--node-rank <node-rank> \--dist-init-addr <node0-ip>:20000 \--host 0.0.0.0 --port 30000 \--reasoning-parser qwen3 \--tool-call-parser qwen3_coder \--mamba-scheduler-strategy extra_buffer
Nex-N2-mini
Single node with 2Ć H100:
bash
docker run --gpus all --shm-size 32g --ipc=host \-p 30000:30000 \-v /path/to/your/model:/model \nexagi/sglang:v0.5.12 \python3 -m sglang.launch_server \--model-path /model \--tp 2 \--host 0.0.0.0 --port 30000 \--reasoning-parser qwen3 \--tool-call-parser qwen3_coder \--mamba-scheduler-strategy extra_buffer
Recommended Sampling Parameters
For the best generation quality, we recommend the following sampling parameters:
temperature: 0.7top_p: 0.95top_k: 40
Function Calling
Nex-series models support robust function-calling capabilities. To enable function calling, add the --tool-call-parser qwen3_coder flag when launching the server:
bash
python -m sglang.launch_server --model-path /path/to/your/model --tool-call-parser qwen3_coder
Reasoning Parser
Nex-series models emit explicit reasoning traces. Add the --reasoning-parser qwen3 flag to parse the reasoning content separately from the final response. It can be combined with the function-calling parser above:
bash
python -m sglang.launch_server --model-path /path/to/your/model --tool-call-parser qwen3_coder --reasoning-parser qwen3
Model provider
nex-agi
Model tree
Base
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information