cs-552-2026-aaty
general_knowledge_model
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model details
- Base model:
Qwen/Qwen3-1.7B - Post-training: supervised fine-tuning (LoRA adapter, merged back into the base weights)
- Domain: general knowledge, multiple-choice, scored at pass@1
- Format: vLLM-loadable safetensors with
config.json,generation_config.json, and a tokenizerchat_template
Output contract
The model writes its reasoning and then wraps the final answer in \boxed{...}.
For multiple-choice items the boxed content is the letter of the chosen option,
and option counts can range from 2 to 20.
markdown
Q: Which planet is closest to the Sun?A) VenusB) MercuryC) MarsD) EarthA: ...reasoning... \boxed{B}
Thinking mode
This model runs in thinking mode: it emits a <think>...</think> reasoning
block before the final \boxed{...} answer. Thinking is forced on inside the
chat template, because the evaluation passes only
tokenizer.apply_chat_template(messages, add_generation_prompt=True) with no
enable_thinking argument, so the template default is the only signal honored.
The relevant line in chat_template.jinja:
jinja
{%- set enable_thinking = true %}
Usage
python
from transformers import AutoTokenizer, AutoModelForCausalLMtok = AutoTokenizer.from_pretrained("cs-552-2026-aaty/general_knowledge_model")model = AutoModelForCausalLM.from_pretrained("cs-552-2026-aaty/general_knowledge_model")messages = [{"role": "user", "content": "What is the capital of Australia?"}]prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = tok(prompt, return_tensors="pt").to(model.device)out = model.generate(**inputs, max_new_tokens=512)print(tok.decode(out[0], skip_special_tokens=True))
Training data
Supervised fine-tuning on cs-552-2026-aaty/sft_mixture, the chat-formatted
mixture built from public QA and knowledge datasets. See the team data pipeline
in code/data/ for the exact sources and filters.
Model provider
cs-552-2026-aaty
Model tree
Base
Qwen/Qwen3-1.7B
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information