cs-552-2026-aaty

group_model

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model details

Base model: Qwen/Qwen3-1.7B
Post-training: SFT (LoRA adapter, merged back into the base weights), then GRPO seeded from the SFT checkpoint with reward functions for the math and reasoning objectives
Domains: math (free-form, pass@8) and general knowledge, safety, multilinguality (multiple-choice, pass@1)
Format: vLLM-loadable safetensors with config.json, generation_config.json, and a tokenizer chat_template

Output contract

The model writes its reasoning and then wraps the final answer in \boxed{...}. The training mix covers both question styles, because the group model is scored on both:

Free-form:

markdown
Q: What is the smallest prime greater than 100?
A: ...reasoning... \boxed{101}

Multiple-choice (the boxed content is the option letter, with 2 to 20 options):

markdown
Q: Which of the following is a noble gas?
A) Oxygen
B) Argon
C) Nitrogen
D) Hydrogen
A: ...reasoning... \boxed{B}

Thinking mode

This model runs in thinking mode: it emits a <think>...</think> reasoning block before the final \boxed{...} answer. Thinking is forced on inside the chat template, because the evaluation passes only tokenizer.apply_chat_template(messages, add_generation_prompt=True) with no enable_thinking argument, so the template default is the only signal honored.

The relevant line in chat_template.jinja:

jinja
{%- set enable_thinking = true %}

Usage

python
from transformers import AutoTokenizer, AutoModelForCausalLM

tok = AutoTokenizer.from_pretrained("cs-552-2026-aaty/group_model")
model = AutoModelForCausalLM.from_pretrained("cs-552-2026-aaty/group_model")

messages = [{"role": "user", "content": "What is the capital of Australia?"}]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512)
print(tok.decode(out[0], skip_special_tokens=True))

Training data

SFT: cs-552-2026-aaty/sft_mixture, the chat-formatted mixture built from public QA, knowledge, instruction, and math datasets.
GRPO: cs-552-2026-aaty/grpo_mixture, prompts with verifiable answers used for reward-driven optimization.

See the team data pipeline in code/data/ for the exact sources and filters.

Model provider

cs-552-2026-aaty

Model tree

Base

Qwen/Qwen3-1.7B

Fine-tuned

this model

Modalities

Input

Text

Output