cs-552-2026-aaty

group_model

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model details

  • Base model: Qwen/Qwen3-1.7B
  • Post-training: SFT (LoRA adapter, merged back into the base weights), then GRPO seeded from the SFT checkpoint with reward functions for the math and reasoning objectives
  • Domains: math (free-form, pass@8) and general knowledge, safety, multilinguality (multiple-choice, pass@1)
  • Format: vLLM-loadable safetensors with config.json, generation_config.json, and a tokenizer chat_template

Output contract

The model writes its reasoning and then wraps the final answer in \boxed{...}. The training mix covers both question styles, because the group model is scored on both:

Free-form:

markdown

Q: What is the smallest prime greater than 100?
A: ...reasoning... \boxed{101}

Multiple-choice (the boxed content is the option letter, with 2 to 20 options):

markdown

Q: Which of the following is a noble gas?
A) Oxygen
B) Argon
C) Nitrogen
D) Hydrogen
A: ...reasoning... \boxed{B}

Thinking mode

This model runs in thinking mode: it emits a <think>...</think> reasoning block before the final \boxed{...} answer. Thinking is forced on inside the chat template, because the evaluation passes only tokenizer.apply_chat_template(messages, add_generation_prompt=True) with no enable_thinking argument, so the template default is the only signal honored.

The relevant line in chat_template.jinja:

jinja

{%- set enable_thinking = true %}

Usage

python

from transformers import AutoTokenizer, AutoModelForCausalLM
tok = AutoTokenizer.from_pretrained("cs-552-2026-aaty/group_model")
model = AutoModelForCausalLM.from_pretrained("cs-552-2026-aaty/group_model")
messages = [{"role": "user", "content": "What is the capital of Australia?"}]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512)
print(tok.decode(out[0], skip_special_tokens=True))

Training data

  • SFT: cs-552-2026-aaty/sft_mixture, the chat-formatted mixture built from public QA, knowledge, instruction, and math datasets.
  • GRPO: cs-552-2026-aaty/grpo_mixture, prompts with verifiable answers used for reward-driven optimization.

See the team data pipeline in code/data/ for the exact sources and filters.

Model provider

cs-552-2026-aaty

Model tree

Base

Qwen/Qwen3-1.7B

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today