Open Source
In keeping with our commitment to open source, we are releasing both Nex-N2-Pro and Nex-N2-mini as open-source models starting today.
We welcome developers and enterprises to integrate and try Nex-N2 and share their feedback.
We evaluate Nex-N2 in real agentic workflows along three directions — agentic tasks, coding tasks, and general tasks — covering benchmarks across tool calling, search-based decision-making, software engineering, and terminal execution. Nex-N2-Pro delivers strong performance that keeps pace with top-tier models such as GPT-5.5 and Opus 4.7: it excels at coding (e.g., 75.3 on Terminal-Bench 2.1) and long-horizon tasks (1585 on GDPval), and shows especially strong generalization and competitiveness on newer benchmarks like SWE-Atlas and DeepSWE. On general capability and core reasoning, it stands on par with leading frontier models.

Nex-N2 ships in two variants, both post-trained on the Qwen3.5 series: Nex-N2-Pro (built on Qwen3.5-397B-A17B) and Nex-N2-mini (built on Qwen3.5-35B-A3B-Base), covering different latency and quality trade-offs. The table below reports their scores alongside leading proprietary and open models across our full evaluation suite.
Table with columns: Benchmark, Nex-N2-mini, Nex-N2-Pro, GPT-5.5, Opus 4.7, Kimi-K2.6, GLM-5.1, MiniMax M3, DeepSeek-V4-Pro| Benchmark | Nex-N2-mini | Nex-N2-Pro | GPT-5.5 | Opus 4.7 | Kimi-K2.6 | GLM-5.1 | MiniMax M3 | DeepSeek-V4-Pro |
|---|
| Agent | | | | | | | | |
| BrowseComp | 74.1 |
Usage
Local Deployment
Note: For the best performance with Nex-series models, we recommend serving them with our customized sglang fork.
First, install our sglang fork:
# Use the customized `sglang` fork
git clone https://github.com/nex-agi/sglang.git
cd sglang
# Install the python packages
pip install --upgrade pip
pip install -e "python"
Nex-N2-Pro
Launch the server (example on two 8× H100 servers with CUDA 13.0):
# Multi-node (2 nodes). Run the same command on every node with:
# <node-rank> = 0 on the head node, 1 on the other node
# <node0-ip> = IP of the head node (reachable from all others)
python -m sglang.launch_server \
--model-path /path/to/your/model \
--tp 16 \
--nnodes 2 \
--node-rank <node-rank> \
--dist-init-addr <node0-ip>:20000 \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--mamba-scheduler-strategy extra_buffer
Nex-N2-mini
Launch the server (example on one 2× H100 server with CUDA 13.0):
python -m sglang.launch_server \
--model-path /path/to/your/model \
--tp 2 \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--mamba-scheduler-strategy extra_buffer
Docker Deployment
We also provide a prebuilt Docker image with our customized sglang fork preinstalled: nexagi/sglang:v0.5.12. The launch command is the same as above.
Nex-N2-Pro
# Multi-node (2 nodes). Run the same command on every node with:
# <node-rank> = 0 on the head node, 1 on the other node
# <node0-ip> = IP of the head node (reachable from all others)
docker run --gpus all --shm-size 32g --network host \
-v /path/to/your/model:/model \
nexagi/sglang:v0.5.12 \
python3 -m sglang.launch_server \
--model-path /model \
--tp 16 \
--nnodes 2 \
--node-rank <node-rank> \
--dist-init-addr <node0-ip>:20000 \
--host 0.0.0.0 --port 30000 \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--mamba-scheduler-strategy extra_buffer
Nex-N2-mini
Single node with 2× H100:
docker run --gpus all --shm-size 32g --ipc=host \
-p 30000:30000 \
-v /path/to/your/model:/model \
nexagi/sglang:v0.5.12 \
python3 -m sglang.launch_server \
--model-path /model \
--tp 2 \
--host 0.0.0.0 --port 30000 \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--mamba-scheduler-strategy extra_buffer
Recommended Sampling Parameters
For the best generation quality, we recommend the following sampling parameters:
temperature: 0.7
top_p: 0.95
top_k: 40
Function Calling
Nex-series models support robust function-calling capabilities. To enable function calling, add the --tool-call-parser qwen3_coder flag when launching the server:
python -m sglang.launch_server --model-path /path/to/your/model --tool-call-parser qwen3_coder
Reasoning Parser
Nex-series models emit explicit reasoning traces. Add the --reasoning-parser qwen3 flag to parse the reasoning content separately from the final response. It can be combined with the function-calling parser above:
python -m sglang.launch_server --model-path /path/to/your/model --tool-call-parser qwen3_coder --reasoning-parser qwen3