Qwen
Qwen-AgentWorld-35B-A3B
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Highlights
- Seven Unified Domains. A single model covers MCP (tool calling), Search, Terminal, SWE (software engineering), Android, Web, and OS — spanning both text and GUI interaction environments.
- Native World Model. Environment modeling from CPT onward, not post-hoc adaptation on a general-purpose LLM.
- Generalizable, Scalable & Controllable Simulator. Zero-shot generalization to OOD environments (e.g., OpenClaw); controllable perturbations and fictional-world construction surpass real-environment training.
- Agent Foundation Model. LWM RL warm-up on single-turn, non-agentic trajectories transfers to multi-turn, tool-calling agentic tasks across 7 benchmarks, including 3 entirely out-of-domain.
Model Overview
- Type: Causal Language Model (Language World Model)
- Base Model: Qwen3.5-35B-A3B-Base
- Training Stage: Continual Pre-Training (CPT) → Supervised Fine-Tuning (SFT) → Reinforcement Learning (RL, GSPO)
- Number of Parameters: 35B in total and 3B activated
- Hidden Dimension: 2048
- Token Embedding: 248320 (Padded)
- Number of Layers: 40
- Hidden Layout: 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE))
- Gated DeltaNet:
- Number of Linear Attention Heads: 32 for V and 16 for QK
- Head Dimension: 128
- Gated Attention:
- Number of Attention Heads: 16 for Q and 2 for KV
- Head Dimension: 256
- Rotary Position Embedding Dimension: 64
- Mixture Of Experts
- Number of Experts: 256
- Number of Activated Experts: 8 Routed + 1 Shared
- Expert Intermediate Dimension: 512
- Context Length: 262,144 tokens
- Disclaimer: No outputs from external API services are included in the training pipeline.
Performance
AgentWorldBench (Open-Ended Evaluation)
Five-dimensional rubric mean per domain, normalized to 0-100 scale.
| Model | MCP | Search | Term. | SWE | Android | Web | OS | Overall |
|---|---|---|---|---|---|---|---|---|
| GPT-5.4 | 70.10 | 37.26 | 53.69 | 66.29 | 60.00 | 51.80 | 68.58 | 58.25 |
| Claude Opus 4.8 | 54.93 | 35.14 | 59.18 | 64.10 | 61.50 | 54.66 | 66.62 | 56.59 |
| Claude Opus 4.6 | 69.90 | 29.30 | 57.51 | 64.55 | 61.74 | 51.42 | 70.20 | 57.80 |
| Gemini 3.1 Pro | 59.07 | 30.21 | 52.47 | 59.07 | 61.40 | 52.83 | 66.92 | 54.57 |
| Claude Sonnet 4.6 | 70.00 | 28.79 | 56.98 | 64.52 | 58.03 | 50.78 | 63.17 | 56.04 |
| DeepSeek-V4-Pro | 63.27 | 27.61 | 51.26 | 59.44 | 55.17 | 50.32 | 63.70 | 52.97 |
| GLM-5.1 | 67.60 | 22.46 | 47.32 | 52.07 | 59.10 | 51.50 | 59.13 | 51.31 |
| Kimi K2.6 | 65.23 | 27.48 | 52.54 | 58.77 | 58.93 | 50.20 | 60.80 | 53.42 |
| MiniMax-M2.7 | 55.82 | 27.30 | 41.62 | 37.44 | 52.40 | 50.52 | 57.73 | 46.12 |
| Qwen3.5-35B-A3B | 57.87 | 25.98 | 46.13 | 47.58 | 53.18 | 47.10 | 56.27 | 47.73 |
| Qwen3.5-397B-A17B | 68.31 | 30.81 | 55.30 | 64.44 | 54.90 | 48.55 | 60.85 | 54.74 |
| Qwen3.6-Plus | 55.28 | 21.94 | 50.58 | 59.08 | 57.65 | 50.78 | 60.33 | 50.81 |
| Qwen-AgentWorld-35B-A3B | 64.79 | 36.69 | 53.96 | 65.63 | 58.17 | 49.55 | 65.92 | 56.39 |
| Qwen-AgentWorld-397B-A17B | 68.24 | 37.82 | 57.73 | 68.49 | 60.20 | 50.98 | 67.89 | 58.71 |
Quickstart
Deployment
Qwen-AgentWorld-35B-A3B can be served via APIs with popular inference frameworks. In the following, we show example commands to launch OpenAI-compatible API servers.
[!Important] The model has a default context length of 262,144 tokens. If you encounter out-of-memory (OOM) errors, consider reducing the context window. However, because Qwen-AgentWorld leverages extended context for multi-turn environment simulation, we advise maintaining a context length of at least 128K tokens.
SGLang
SGLang is a fast serving framework for large language models.
bash
python -m sglang.launch_server \--model-path Qwen/Qwen-AgentWorld-35B-A3B \--port 8000 \--tp-size 4 \--context-length 262144 \--reasoning-parser qwen3
An OpenAI-compatible API will be available at http://localhost:8000/v1.
vLLM
vLLM is a high-throughput and memory-efficient inference engine for LLMs.
bash
vllm serve Qwen/Qwen-AgentWorld-35B-A3B \--port 8000 \--tensor-parallel-size 4 \--max-model-len 262144 \--reasoning-parser qwen3 \--language-model-only \--trust-remote-code
The
--language-model-onlyflag is required because the model architecture includes visual component definitions but the checkpoint only contains language model weights. Without this flag, vLLM will attempt to initialize visual modules and fail.
An OpenAI-compatible API will be available at http://localhost:8000/v1.
Inference with Transformers
python
from transformers import AutoModelForCausalLM, AutoTokenizermodel_name = "Qwen/Qwen-AgentWorld-35B-A3B"tokenizer = AutoTokenizer.from_pretrained(model_name)model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto",)messages = [{"role": "system","content": "You are a language world model simulating a Linux terminal environment. ""Given the user's command, predict the terminal output."},{"role": "user","content": "Action: execute_bash\nCommand: ls -la /home/user/project/"}]text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = tokenizer([text], return_tensors="pt").to(model.device)outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6)response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)print(response)
Using via the Chat Completions API
python
from openai import OpenAIclient = OpenAI(base_url="http://localhost:8000/v1",api_key="EMPTY",)# Terminal domain examplemessages = [{"role": "system","content": "You are a language world model simulating a Linux terminal environment. ""Given the user's command, predict the terminal output."},{"role": "user","content": "Action: execute_bash\nCommand: ls -la /home/user/project/"}]response = client.chat.completions.create(model="Qwen/Qwen-AgentWorld-35B-A3B",messages=messages,max_tokens=32768,temperature=0.6,)print(response.choices[0].message.content)
[!Note] We provide domain-specific world model system prompt templates in
prompts/of the GitHub repository for all 7 domains. These serve as general-purpose system prompts when using Qwen-AgentWorld as an environment simulator. Each domain folder contains asystem_prompt.txt(world model system prompt) and ajudge_system_prompt.txt(evaluation prompt).
Evaluate on AgentWorldBench
AgentWorldBench evaluates language world models by scoring each predicted environment observation on 5 dimensions: Format, Factuality, Consistency, Realism, and Quality.
Setup
bash
# Clone the evaluation repositorygit clone https://github.com/QwenLM/Qwen-AgentWorld.gitcd Qwen-AgentWorld# Download the benchmarkhuggingface-cli download Qwen/AgentWorldBench --repo-type dataset --local-dir ./AgentWorldBench# Install dependenciespip install openai
Run Evaluation
The evaluation follows a three-step pipeline:
bash
cd eval# Step 1: Run world model inferencepython eval.py infer \--data-dir ../AgentWorldBench \--model-base-url http://localhost:8000/v1 \--model-name Qwen/Qwen-AgentWorld-35B-A3B \--output-dir ./results# Step 2: Run LLM judge scoringexport OPENAI_API_KEY="your-api-key"python eval.py judge \--predictions ./results/predictions.jsonl \--judge-base-url https://api.openai.com/v1 \--judge-model gpt-5.2-2025-12-11 \--output-dir ./results# Step 3: Aggregate and display scorespython eval.py score --predictions ./results/judged.jsonl
Best Practices
-
Sampling Parameters: We recommend
temperature=0.6,top_p=0.95,top_k=20for world model inference. The model uses thinking mode by default (<think>...</think>) to reason about environment state transitions before producing the predicted observation. -
Adequate Output Length: We recommend an output length of 32,768 tokens for most queries. For long, multi-step trajectories, you may increase the max output length to accommodate detailed environment observations.
-
Domain-Specific System Prompts: For optimal simulation fidelity, use the domain-specific system prompts provided in the
prompts/directory of the GitHub repository.
Citation
If you find our work helpful, feel free to give us a cite.
bibtex
@article{zuo2026qwen,title={Qwen-agentworld: language world models for general agents},author={Zuo, Yuxin and Xiao, Zikai and Sheng, Li and Huang, Fei and Tu, Jianhong and Liu, Yuxuan and Tang, Tianyi and Hu, Xiaomeng and Su, Yang and Lan, Qingfeng and others},journal={arXiv preprint arXiv:2606.24597},year={2026}}
Model provider
Qwen
Model tree
Base
Qwen/Qwen3.5-35B-A3B-Base
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information