Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Description

This model is fine-tuned to solve Countdown problems: given a target number and a set of allowed numbers, produce a valid arithmetic equation that equals the target. The model generates chain-of-thought reasoning inside <think> tags and a final answer inside <answer> tags.

Training Details

HyperparameterValue
Base modelQwen/Qwen2.5-0.5B
DatasetAsap7772/cog_behav_all_strategies
Epochs6
Learning rate5e-5
LR scheduleCosine with 5% warmup
Batch size64 (gradient accumulation = 8)
Weight decay0.01
Gradient clipping1.0
Precisionbfloat16
Gradient checkpointingEnabled
Hardware1x NVIDIA H100 (Modal)
Max prompt length512
Max response length1024

Loss is computed only on response tokens (assistant turn), with prompt tokens masked out.

Evaluation

Evaluated on asingh15/countdown_tasks_3to4 test split (40 prompts) using vLLM with temperature 0.6, top-k 20, top-p 0.95, sampling K=16 responses per prompt.

MetricValue
Average Score0.3660
Pass@10.30
Pass@160.75 (30/40)

Usage

python

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("ba144220/cs224r-default-project-sft")
tokenizer = AutoTokenizer.from_pretrained("ba144220/cs224r-default-project-sft")
messages = [{"role": "user", "content": "Using the numbers [44, 19, 35], create an equation that equals 98."}]
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.6, top_k=20, top_p=0.95, do_sample=True)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Example Output

Prompt: Using the numbers [44, 19, 35], create an equation that equals 98.

Model output:

markdown

<think>
Let me try to find a path to 98.
First, let's look at the numbers: 44, 19, and 35.
Working backwards from 98: 98 is even, so it might be helpful to look for factors.
Let's try some operations:
44 + 19 = 63 (getting closer)
63 + 35 = 98 (this is our target!)
We can do: 19 + 44 = 63, 63 + 35 = 98.
Let's verify: (19 + 44) + 35 = 98 ✓
</think>
<answer> (19 + 44) + 35 </answer>

Score: 1.0 (correct)

Limitations

  • Trained and evaluated only on the Countdown arithmetic task; not intended for general-purpose use.
  • Performance degrades on harder problems with more numbers or larger targets.
  • The 0.5B parameter size limits reasoning capacity compared to larger models.

Authors

Yuchi Hsu (yuchihsu@stanford.edu) and Ryan He (ryanhe@stanford.edu), Stanford CS224R Spring 2026.

Model provider

ba144220

Model tree

Base

Qwen/Qwen2.5-0.5B

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today