insagur
qwen3.5-9b-agentnet-cot-l2-step100
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Training format (OpenCUA L2)
markdown
## Thought:<reasoning>## Action:<one-sentence>## Code:pyautogui.click(x=0.5, y=0.5)
Coordinates normalized to [0, 1]. The ## markdown headers help the
base model emit the schema reliably (vs. the legacy bare Thought:
form). See insagur/qwen3.5-9b-agentnet-ubuntu-1epoch for the legacy-format variant.
Training config
- Hardware: 1 × 8 A100 80GB SXM4
- Distributed: DeepSpeed ZeRO-2 + bf16
- Optimizer: AdamW, LR 1e-5 cosine, warmup 200 steps
- Batch: per_device_bs=1 × grad_accum=16 × 8 GPU = global batch 128
- Steps: 100 (preempted; 1 epoch = 300 steps)
- EMA teacher: target=block, decay=0.9995, α=0.5
- Sequence length: 3072
- Image tokens: 2048 (≈1.6M pixel cap)
- Save frequency: every 50 steps
Metrics @ step 100
| Metric | Value |
|---|---|
| Train loss | 0.4601 |
| Train token_acc | 0.8416 |
| Eval loss | 0.4718 |
| Eval token_acc | 0.8387 |
Already approaches the fully-trained legacy-format model's eval loss
(0.4622) at only 33% of training, suggesting the ## format converges
faster.
Data
scripts/convert_agentnet_cot.py --cot_level l2 produces this format
from AgentNet 5K trajectories with the same quality filter as the
legacy converter (alignment≥7, efficiency≥5).
| Split | Samples |
|---|---|
| Train | 38,317 |
| Val | 1,866 |
Inference
python
from transformers import AutoModelForImageTextToText, AutoProcessormodel = AutoModelForImageTextToText.from_pretrained("insagur/qwen3.5-9b-agentnet-cot-l2-step100",torch_dtype="bfloat16",).to("cuda")processor = AutoProcessor.from_pretrained("insagur/qwen3.5-9b-agentnet-cot-l2-step100")system = ("You are a computer-use agent operating a Linux desktop. ""Respond using the OpenCUA L2 format:\n""## Thought:\n<reasoning>\n\n## Action:\n<one-sentence>\n\n## Code:\n<pyautogui code with normalized [0,1] coords>")# ... see scripts/eval.py in the training repo for full inference loop ...
Recipe
Training code: https://github.com/2bhapby/gui_internal_worldmodel
bash
python scripts/convert_agentnet_cot.py --src ... --images_dir ... --out_dir ./agentnet_l2 --cot_level l2CONFIG=configs/qwen35_9b_agentnet.yaml RUN_NAME=a100-9b-1ep-cot-l2 \sbatch --gpus=8 scripts/slurm_train_qwen.sbatch \data.train_jsonl=./agentnet_l2/train.jsonl \data.val_jsonl=./agentnet_l2/val.jsonl
Model provider
insagur
Model tree
Base
Qwen/Qwen3.5-9B
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information