Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: mitPrompt Format
Use this system prompt:
text
Please reason step by step, and put your final answer within \boxed{}.
Example chat:
python
messages = [{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},{"role": "user", "content": "Solve the problem here."},]
Training Details
| Item | Value |
|---|---|
| Base model | deepseek-ai/DeepSeek-R1-Distill-Qwen-14B |
| Dataset | RabotniKuma/Fast-Math-R1-SFT |
| Training type | Full-parameter SFT |
| GPUs used | 6 x NVIDIA H200 |
| Per-device batch size | 1 |
| Gradient accumulation | 8 |
| Effective global batch size | 48 |
| Epochs | 10 |
| Max sequence length | 24,000 tokens |
| Packing | Enabled |
| Learning rate | 1e-5 |
| Scheduler | Cosine |
| Precision | bfloat16 |
| Distributed setup | DeepSpeed ZeRO-3 |
The target recipe was based on the Fast-Math-R1 style training flow from analokmaus/kaggle-aimo2-fast-math-r1, adapted for this model and dataset.
Usage
python
from transformers import AutoModelForCausalLM, AutoTokenizermodel_id = "zbeeb/deepseek-r1-distill-qwen-14b-fast-math-r1-sft-10ep"tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(model_id,torch_dtype="auto",device_map="auto",trust_remote_code=True,)messages = [{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},{"role": "user", "content": "What is 17 * 23?"},]inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt",).to(model.device)outputs = model.generate(inputs,max_new_tokens=2048,temperature=0.6,top_p=0.95,do_sample=True,)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Intended Use
This model is intended for:
- Math reasoning research
- AIMO-style problem solving experiments
- Long-context supervised fine-tuning experiments
- Comparing Fast-Math-R1-style SFT against the base distilled R1 model
Limitations
This model has not been independently benchmarked in this card. It may produce incorrect reasoning, malformed final answers, or answers that look plausible but are wrong. Validate outputs before using them in any setting where correctness matters.
The training run used 24k-token sequences, but practical inference context length depends on the serving stack, GPU memory, and runtime configuration.
Source Models and Data
- Base model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
- Dataset: RabotniKuma/Fast-Math-R1-SFT
License
The base model is listed on Hugging Face with an MIT license. The training dataset is listed with an Apache-2.0 license. This model card declares MIT for the uploaded fine-tuned model.
Model provider
zbeeb
Model tree
Base
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information