cs-552-2026-aaty
group_model
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model details
- Base model:
Qwen/Qwen3-1.7B - Post-training: SFT (LoRA adapter, merged back into the base weights), then GRPO seeded from the SFT checkpoint with reward functions for the math and reasoning objectives
- Domains: math (free-form, pass@8) and general knowledge, safety, multilinguality (multiple-choice, pass@1)
- Format: vLLM-loadable safetensors with
config.json,generation_config.json, and a tokenizerchat_template
Output contract
The model writes its reasoning and then wraps the final answer in \boxed{...}.
The training mix covers both question styles, because the group model is scored
on both:
Free-form:
markdown
Q: What is the smallest prime greater than 100?A: ...reasoning... \boxed{101}
Multiple-choice (the boxed content is the option letter, with 2 to 20 options):
markdown
Q: Which of the following is a noble gas?A) OxygenB) ArgonC) NitrogenD) HydrogenA: ...reasoning... \boxed{B}
Thinking mode
This model runs in thinking mode: it emits a <think>...</think> reasoning
block before the final \boxed{...} answer. Thinking is forced on inside the
chat template, because the evaluation passes only
tokenizer.apply_chat_template(messages, add_generation_prompt=True) with no
enable_thinking argument, so the template default is the only signal honored.
The relevant line in chat_template.jinja:
jinja
{%- set enable_thinking = true %}
Usage
python
from transformers import AutoTokenizer, AutoModelForCausalLMtok = AutoTokenizer.from_pretrained("cs-552-2026-aaty/group_model")model = AutoModelForCausalLM.from_pretrained("cs-552-2026-aaty/group_model")messages = [{"role": "user", "content": "What is the capital of Australia?"}]prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = tok(prompt, return_tensors="pt").to(model.device)out = model.generate(**inputs, max_new_tokens=512)print(tok.decode(out[0], skip_special_tokens=True))
Training data
- SFT:
cs-552-2026-aaty/sft_mixture, the chat-formatted mixture built from public QA, knowledge, instruction, and math datasets. - GRPO:
cs-552-2026-aaty/grpo_mixture, prompts with verifiable answers used for reward-driven optimization.
See the team data pipeline in code/data/ for the exact sources and filters.
Model provider
cs-552-2026-aaty
Model tree
Base
Qwen/Qwen3-1.7B
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information