unsloth

unsloth

Qwen-AgentWorld-35B-A3B

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Highlights

  • Seven Unified Domains. A single model covers MCP (tool calling), Search, Terminal, SWE (software engineering), Android, Web, and OS — spanning both text and GUI interaction environments.
  • Native World Model. Environment modeling from CPT onward, not post-hoc adaptation on a general-purpose LLM.
  • Generalizable, Scalable & Controllable Simulator. Zero-shot generalization to OOD environments (e.g., OpenClaw); controllable perturbations and fictional-world construction surpass real-environment training.
  • Agent Foundation Model. LWM RL warm-up on single-turn, non-agentic trajectories transfers to multi-turn, tool-calling agentic tasks across 7 benchmarks, including 3 entirely out-of-domain.

Model Overview

  • Type: Causal Language Model (Language World Model)
  • Base Model: Qwen3.5-35B-A3B-Base
  • Training Stage: Continual Pre-Training (CPT) → Supervised Fine-Tuning (SFT) → Reinforcement Learning (RL, GSPO)
  • Number of Parameters: 35B in total and 3B activated
  • Hidden Dimension: 2048
  • Token Embedding: 248320 (Padded)
  • Number of Layers: 40
  • Hidden Layout: 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE))
  • Gated DeltaNet:
    • Number of Linear Attention Heads: 32 for V and 16 for QK
    • Head Dimension: 128
  • Gated Attention:
    • Number of Attention Heads: 16 for Q and 2 for KV
    • Head Dimension: 256
    • Rotary Position Embedding Dimension: 64
  • Mixture Of Experts
    • Number of Experts: 256
    • Number of Activated Experts: 8 Routed + 1 Shared
    • Expert Intermediate Dimension: 512
  • Context Length: 262,144 tokens
  • Disclaimer: No outputs from external API services are included in the training pipeline.

Performance

AgentWorldBench (Open-Ended Evaluation)

Five-dimensional rubric mean per domain, normalized to 0-100 scale.

Table
ModelMCPSearchTerm.SWEAndroidWebOSOverall
GPT-5.470.1037.2653.6966.2960.0051.8068.5858.25
Claude Opus 4.854.9335.1459.1864.1061.5054.6666.6256.59
Claude Opus 4.669.9029.3057.5164.5561.7451.4270.2057.80
Gemini 3.1 Pro59.0730.2152.4759.0761.4052.8366.9254.57
Claude Sonnet 4.670.0028.7956.9864.5258.0350.7863.1756.04
DeepSeek-V4-Pro63.2727.6151.2659.4455.1750.3263.7052.97
GLM-5.167.6022.4647.3252.0759.1051.5059.1351.31
Kimi K2.665.2327.4852.5458.7758.9350.2060.8053.42
MiniMax-M2.755.8227.3041.6237.4452.4050.5257.7346.12
Qwen3.5-35B-A3B57.8725.9846.1347.5853.1847.1056.2747.73
Qwen3.5-397B-A17B68.3130.8155.3064.4454.9048.5560.8554.74
Qwen3.6-Plus55.2821.9450.5859.0857.6550.7860.3350.81
Qwen-AgentWorld-35B-A3B64.7936.6953.9665.6358.1749.5565.9256.39
Qwen-AgentWorld-397B-A17B68.2437.8257.7368.4960.2050.9867.8958.71

Quickstart

Deployment

Qwen-AgentWorld-35B-A3B can be served via APIs with popular inference frameworks. In the following, we show example commands to launch OpenAI-compatible API servers.

[!Important] The model has a default context length of 262,144 tokens. If you encounter out-of-memory (OOM) errors, consider reducing the context window. However, because Qwen-AgentWorld leverages extended context for multi-turn environment simulation, we advise maintaining a context length of at least 128K tokens.

SGLang

SGLang is a fast serving framework for large language models.

bash

python -m sglang.launch_server \
--model-path Qwen/Qwen-AgentWorld-35B-A3B \
--port 8000 \
--tp-size 4 \
--context-length 262144 \
--reasoning-parser qwen3

An OpenAI-compatible API will be available at http://localhost:8000/v1.

vLLM

vLLM is a high-throughput and memory-efficient inference engine for LLMs.

bash

vllm serve Qwen/Qwen-AgentWorld-35B-A3B \
--port 8000 \
--tensor-parallel-size 4 \
--max-model-len 262144 \
--reasoning-parser qwen3 \
--trust-remote-code

An OpenAI-compatible API will be available at http://localhost:8000/v1.

Inference with Transformers

python

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen-AgentWorld-35B-A3B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
)
messages = [
{
"role": "system",
"content": "You are a language world model simulating a Linux terminal environment. "
"Given the user's command, predict the terminal output."
},
{
"role": "user",
"content": "Action: execute_bash\nCommand: ls -la /home/user/project/"
}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

Using via the Chat Completions API

python

from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY",
)
# Terminal domain example
messages = [
{
"role": "system",
"content": "You are a language world model simulating a Linux terminal environment. "
"Given the user's command, predict the terminal output."
},
{
"role": "user",
"content": "Action: execute_bash\nCommand: ls -la /home/user/project/"
}
]
response = client.chat.completions.create(
model="Qwen/Qwen-AgentWorld-35B-A3B",
messages=messages,
max_tokens=32768,
temperature=0.6,
)
print(response.choices[0].message.content)

[!Note] We provide domain-specific world model system prompt templates in prompts/ of the GitHub repository for all 7 domains. These serve as general-purpose system prompts when using Qwen-AgentWorld as an environment simulator. Each domain folder contains a system_prompt.txt (world model system prompt) and a judge_system_prompt.txt (evaluation prompt).

Evaluate on AgentWorldBench

AgentWorldBench evaluates language world models by scoring each predicted environment observation on 5 dimensions: Format, Factuality, Consistency, Realism, and Quality.

Setup

bash

# Clone the evaluation repository
git clone https://github.com/QwenLM/Qwen-AgentWorld.git
cd Qwen-AgentWorld
# Download the benchmark
huggingface-cli download Qwen/AgentWorldBench --repo-type dataset --local-dir ./AgentWorldBench
# Install dependencies
pip install openai

Run Evaluation

The evaluation follows a three-step pipeline:

bash

cd eval
# Step 1: Run world model inference
python eval.py infer \
--data-dir ../AgentWorldBench \
--model-base-url http://localhost:8000/v1 \
--model-name Qwen/Qwen-AgentWorld-35B-A3B \
--output-dir ./results
# Step 2: Run LLM judge scoring
export OPENAI_API_KEY="your-api-key"
python eval.py judge \
--predictions ./results/predictions.jsonl \
--judge-base-url https://api.openai.com/v1 \
--judge-model gpt-5.2-2025-12-11 \
--output-dir ./results
# Step 3: Aggregate and display scores
python eval.py score --predictions ./results/judged.jsonl

Best Practices

  1. Sampling Parameters: We recommend temperature=0.6, top_p=0.95, top_k=20 for world model inference. The model uses thinking mode by default (<think>...</think>) to reason about environment state transitions before producing the predicted observation.

  2. Adequate Output Length: We recommend an output length of 32,768 tokens for most queries. For long, multi-step trajectories, you may increase the max output length to accommodate detailed environment observations.

  3. Domain-Specific System Prompts: For optimal simulation fidelity, use the domain-specific system prompts provided in the prompts/ directory of the GitHub repository.

Citation

If you find our work helpful, feel free to give us a cite.

bibtex

@article{zuo2026qwen,
title={Qwen-agentworld: language world models for general agents},
author={Zuo, Yuxin and Xiao, Zikai and Sheng, Li and Huang, Fei and Tu, Jianhong and Liu, Yuxuan and Tang, Tianyi and Hu, Xiaomeng and Su, Yang and Lan, Qingfeng and others},
journal={arXiv preprint arXiv:2606.24597},
year={2026}
}

Model provider

unsloth

unsloth

Model tree

Base

Qwen/Qwen-AgentWorld-35B-A3B

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today