cs-552-2026-ma-que
general_knowledge_model
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0CS-552 Ma Que — General Knowledge Model
General knowledge expert for the EPFL CS-552 (Modern NLP, Spring 2026) group project "Building a robust small language model for edge devices" (Group 11, Ma Que). This is an individual model; the team also maintains a merged group model (https://huggingface.co/cs-552-2026-ma-que/group_model).
What it is
- Base model: Qwen/Qwen3-1.7B-Base (https://huggingface.co/Qwen/Qwen3-1.7B-Base) (1.7B params).
- Post-training: LoRA SFT using nlp_project/sft_thinking.py with the configuration in nlp_project/cfgs_thinking.yml. The run uses LoRA rank 256, alpha 256, dropout 0, and targets q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, and down_proj. Training uses bf16, gradient checkpointing, Liger kernel, assistant-only loss, max sequence length 6144, learning rate 2e-4, cosine scheduler, 5% warmup, batch size 20, gradient accumulation 1, and 2 epochs.
- Task: general knowledge multiple-choice answering with reasoning-style responses. The model reads a question with lettered options and commits to one boxed option letter.
- Output contract: the final answer is emitted once as \boxed{X} (X is a single capital option letter), for the course OpenCompass/vLLM parser. Thinking mode is ON.
- How behaviour is set: the Qwen3 chat template is applied automatically by tokenizer.apply_chat_template(messages, add_generation_prompt=True). Training examples include assistant reasoning traces with ..., and the model is intended to answer in thinking mode before producing the final boxed answer.
Inference / decoding
Values below are exactly those in generation_config.json:
┌────────────────┬────────────────────────────────────────────────────────────────┐ │ param │ value │ ├────────────────┼────────────────────────────────────────────────────────────────┤ │ do_sample │ true │ │ temperature │ 0.6 │ │ top_k │ 20 │ │ top_p │ 0.95 │ │ max_new_tokens │ not set in generation_config.json; use 16384 for final grading │ └────────────────┴────────────────────────────────────────────────────────────────┘
Usage:
markdown
from transformers import AutoModelForCausalLM, AutoTokenizerrepo = "cs-552-2026-ma-que/general_knowledge_model"tok = AutoTokenizer.from_pretrained(repo)model = AutoModelForCausalLM.from_pretrained(repo,torch_dtype="auto",device_map="auto",)messages = [{"role": "user", "content":"Which planet is known as the Red Planet?\n""A) Venus\nB) Mars\nC) Jupiter\nD) Mercury"}]text = tok.apply_chat_template(messages,tokenize=False,add_generation_prompt=True,)out = model.generate(**tok(text, return_tensors="pt").to(model.device),max_new_tokens=16384,)print(tok.decode(out[0], skip_special_tokens=True)) # -> ... \boxed{B}
The model is also vLLM-loadable for batched evaluation.
Training data
The thinking model was trained with sft_thinking.py using the dataset TangYeqing/maque-data, as configured in cfgs_thinking.yml.
The script loads the dataset's train split and performs a local train_test_split(test_size=0.005, seed=42). Each row contains conversational messages; if a row has no system message, the training script prepends:
You are a helpful assistant that provides step-by-step solutions to math problems.
The assistant turns already contain literal ... reasoning traces, which are preserved during SFT. Training uses assistant-only loss, so the loss is computed on the assistant response, including the thinking trace and final boxed answer.
Intended use & limitations
Research artifact for the CS-552 general knowledge benchmark; produces boxed multiple-choice answers only. It is not an authoritative factual system and may encode outdated, incomplete, or incorrect facts, so outputs should be verified before any real-world or safety-critical use.
Citation
Built on Qwen3-1.7B (Qwen Team, 2025), arXiv:2505.09388 — https://arxiv.org/abs/2505.09388
Model provider
cs-552-2026-ma-que
Model tree
Base
Qwen/Qwen3-1.7B-Base
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information