insagur

qwen3.5-9b-agentnet-cot-l2-step100

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Training format (OpenCUA L2)

markdown

## Thought:
<reasoning>
## Action:
<one-sentence>
## Code:
pyautogui.click(x=0.5, y=0.5)

Coordinates normalized to [0, 1]. The ## markdown headers help the base model emit the schema reliably (vs. the legacy bare Thought: form). See insagur/qwen3.5-9b-agentnet-ubuntu-1epoch for the legacy-format variant.

Training config

  • Hardware: 1 × 8 A100 80GB SXM4
  • Distributed: DeepSpeed ZeRO-2 + bf16
  • Optimizer: AdamW, LR 1e-5 cosine, warmup 200 steps
  • Batch: per_device_bs=1 × grad_accum=16 × 8 GPU = global batch 128
  • Steps: 100 (preempted; 1 epoch = 300 steps)
  • EMA teacher: target=block, decay=0.9995, α=0.5
  • Sequence length: 3072
  • Image tokens: 2048 (≈1.6M pixel cap)
  • Save frequency: every 50 steps

Metrics @ step 100

Table
MetricValue
Train loss0.4601
Train token_acc0.8416
Eval loss0.4718
Eval token_acc0.8387

Already approaches the fully-trained legacy-format model's eval loss (0.4622) at only 33% of training, suggesting the ## format converges faster.

Data

scripts/convert_agentnet_cot.py --cot_level l2 produces this format from AgentNet 5K trajectories with the same quality filter as the legacy converter (alignment≥7, efficiency≥5).

Table
SplitSamples
Train38,317
Val1,866

Inference

python

from transformers import AutoModelForImageTextToText, AutoProcessor
model = AutoModelForImageTextToText.from_pretrained(
"insagur/qwen3.5-9b-agentnet-cot-l2-step100",
torch_dtype="bfloat16",
).to("cuda")
processor = AutoProcessor.from_pretrained("insagur/qwen3.5-9b-agentnet-cot-l2-step100")
system = (
"You are a computer-use agent operating a Linux desktop. "
"Respond using the OpenCUA L2 format:\n"
"## Thought:\n<reasoning>\n\n## Action:\n<one-sentence>\n\n## Code:\n<pyautogui code with normalized [0,1] coords>"
)
# ... see scripts/eval.py in the training repo for full inference loop ...

Recipe

Training code: https://github.com/2bhapby/gui_internal_worldmodel

bash

python scripts/convert_agentnet_cot.py --src ... --images_dir ... --out_dir ./agentnet_l2 --cot_level l2
CONFIG=configs/qwen35_9b_agentnet.yaml RUN_NAME=a100-9b-1ep-cot-l2 \
sbatch --gpus=8 scripts/slurm_train_qwen.sbatch \
data.train_jsonl=./agentnet_l2/train.jsonl \
data.val_jsonl=./agentnet_l2/val.jsonl

Model provider

insagur

Model tree

Base

Qwen/Qwen3.5-9B

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today