cs-552-2026-ma-que

general_knowledge_model

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

CS-552 Ma Que — General Knowledge Model

General knowledge expert for the EPFL CS-552 (Modern NLP, Spring 2026) group project "Building a robust small language model for edge devices" (Group 11, Ma Que). This is an individual model; the team also maintains a merged group model (https://huggingface.co/cs-552-2026-ma-que/group_model).

What it is

  • Base model: Qwen/Qwen3-1.7B-Base (https://huggingface.co/Qwen/Qwen3-1.7B-Base) (1.7B params).
  • Post-training: LoRA SFT using nlp_project/sft_thinking.py with the configuration in nlp_project/cfgs_thinking.yml. The run uses LoRA rank 256, alpha 256, dropout 0, and targets q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, and down_proj. Training uses bf16, gradient checkpointing, Liger kernel, assistant-only loss, max sequence length 6144, learning rate 2e-4, cosine scheduler, 5% warmup, batch size 20, gradient accumulation 1, and 2 epochs.
  • Task: general knowledge multiple-choice answering with reasoning-style responses. The model reads a question with lettered options and commits to one boxed option letter.
  • Output contract: the final answer is emitted once as \boxed{X} (X is a single capital option letter), for the course OpenCompass/vLLM parser. Thinking mode is ON.
  • How behaviour is set: the Qwen3 chat template is applied automatically by tokenizer.apply_chat_template(messages, add_generation_prompt=True). Training examples include assistant reasoning traces with ..., and the model is intended to answer in thinking mode before producing the final boxed answer.

Inference / decoding

Values below are exactly those in generation_config.json:

┌────────────────┬────────────────────────────────────────────────────────────────┐ │ param │ value │ ├────────────────┼────────────────────────────────────────────────────────────────┤ │ do_sample │ true │ │ temperature │ 0.6 │ │ top_k │ 20 │ │ top_p │ 0.95 │ │ max_new_tokens │ not set in generation_config.json; use 16384 for final grading │ └────────────────┴────────────────────────────────────────────────────────────────┘

Usage:

markdown

from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "cs-552-2026-ma-que/general_knowledge_model"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(
repo,
torch_dtype="auto",
device_map="auto",
)
messages = [{"role": "user", "content":
"Which planet is known as the Red Planet?\n"
"A) Venus\nB) Mars\nC) Jupiter\nD) Mercury"}]
text = tok.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
out = model.generate(
**tok(text, return_tensors="pt").to(model.device),
max_new_tokens=16384,
)
print(tok.decode(out[0], skip_special_tokens=True)) # -> ... \boxed{B}

The model is also vLLM-loadable for batched evaluation.

Training data

The thinking model was trained with sft_thinking.py using the dataset TangYeqing/maque-data, as configured in cfgs_thinking.yml.

The script loads the dataset's train split and performs a local train_test_split(test_size=0.005, seed=42). Each row contains conversational messages; if a row has no system message, the training script prepends:

You are a helpful assistant that provides step-by-step solutions to math problems.

The assistant turns already contain literal ... reasoning traces, which are preserved during SFT. Training uses assistant-only loss, so the loss is computed on the assistant response, including the thinking trace and final boxed answer.

Intended use & limitations

Research artifact for the CS-552 general knowledge benchmark; produces boxed multiple-choice answers only. It is not an authoritative factual system and may encode outdated, incomplete, or incorrect facts, so outputs should be verified before any real-world or safety-critical use.

Citation

Built on Qwen3-1.7B (Qwen Team, 2025), arXiv:2505.09388 — https://arxiv.org/abs/2505.09388

Model provider

cs-552-2026-ma-que

Model tree

Base

Qwen/Qwen3-1.7B-Base

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today