cs-552-2026-aaty

general_knowledge_model

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model details

Base model: Qwen/Qwen3-1.7B
Post-training: supervised fine-tuning (LoRA adapter, merged back into the base weights)
Domain: general knowledge, multiple-choice, scored at pass@1
Format: vLLM-loadable safetensors with config.json, generation_config.json, and a tokenizer chat_template

Output contract

The model writes its reasoning and then wraps the final answer in \boxed{...}. For multiple-choice items the boxed content is the letter of the chosen option, and option counts can range from 2 to 20.

markdown
Q: Which planet is closest to the Sun?
A) Venus
B) Mercury
C) Mars
D) Earth
A: ...reasoning... \boxed{B}

Thinking mode

This model runs in thinking mode: it emits a <think>...</think> reasoning block before the final \boxed{...} answer. Thinking is forced on inside the chat template, because the evaluation passes only tokenizer.apply_chat_template(messages, add_generation_prompt=True) with no enable_thinking argument, so the template default is the only signal honored.

The relevant line in chat_template.jinja:

jinja
{%- set enable_thinking = true %}

Usage

python
from transformers import AutoTokenizer, AutoModelForCausalLM

tok = AutoTokenizer.from_pretrained("cs-552-2026-aaty/general_knowledge_model")
model = AutoModelForCausalLM.from_pretrained("cs-552-2026-aaty/general_knowledge_model")

messages = [{"role": "user", "content": "What is the capital of Australia?"}]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512)
print(tok.decode(out[0], skip_special_tokens=True))

Training data

Supervised fine-tuning on cs-552-2026-aaty/sft_mixture, the chat-formatted mixture built from public QA and knowledge datasets. See the team data pipeline in code/data/ for the exact sources and filters.

Model provider

cs-552-2026-aaty

Model tree

Base

Qwen/Qwen3-1.7B

Fine-tuned

this model

Modalities

Input

Text

Output

Text