groundhogLLM

ACC-Qwen3-30B-A3B

README

License: apache-2.0

Overview

We fine-tuned Qwen3-30B-A3B-Thinking with Agent Context Compilation (ACC) — a method that converts multi-turn agent trajectories (Search, SWE, SQL) into long-context QA pairs for direct supervised fine-tuning. Unlike standard agent SFT that masks tool responses, ACC assembles scattered evidence across turns into a single context, enabling explicit supervision of long-range dependency modeling.

Performance Highlights

Table with columns: Benchmark, Score, Δ vs Base
Benchmark	Score	Δ vs Base
MRCR	68.28	+18.09
GraphWalks	77.51	+7.59
GPQA-Diamond	70.20	+2.49
MMLU-Pro	76.00	+1.50

Results on MRCR and GraphWalks are comparable to Qwen3-235B-A22B despite ~8× fewer active parameters. General capabilities are preserved.

Training Data

Dataset: groundhogLLM/ACC-dataset
Size: 10,802 compiled trajectories (Search: 3,369; SWE: 4,368; SQL: 3,065)
Context length: 2K – 128K tokens
Training seq length: 131,072 tokens
Epochs: 4

Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "groundhogLLM/ACC-Qwen3-30B-A3B"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Standard Qwen3 chat template applies
messages = [{"role": "user", "content": "Your long-context question here..."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0]))

Citation

If you use this model, please cite:

bibtex
@misc{su2026acccompilingagenttrajectories,
      title={ACC: Compiling Agent Trajectories for Long-Context Training}, 
      author={Qisheng Su and Zhen Fang and Shiting Huang and Yu Zeng and Yiming Zhao and Kou Shi and Ziao Zhang and Lin Chen and Zehui Chen and Lijun Wu and Feng Zhao},
      year={2026},
      eprint={2605.21850},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.21850}, 
}

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

groundhogLLM

Model Tree

Base

this model

Input Modalities

Text

Output Modalities