sfanm/d24-sft-v1base-mathheavy-3.7B API & Inference Endpoint

Use (chat)

python
from transformers import AutoModelForCausalLM, AutoTokenizer
mid = "sfanm/d24-sft-v1base-mathheavy-3.7B"
tok = AutoTokenizer.from_pretrained(mid)
model = AutoModelForCausalLM.from_pretrained(mid, torch_dtype="bfloat16", device_map="auto")

msgs = [{"role": "user", "content": "Natalia sold clips to 48 friends in April and half as many in May. How many total?"}]
ids = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=512, do_sample=False, stop_strings=["<|im_end|>"], tokenizer=tok)
print(tok.decode(out[0, ids.shape[1]:], skip_special_tokens=False))

Without stop_strings=["<|im_end|>"] the model rambles to max_new_tokens: the configured eos_token_id (50256) is the GPT-2 document EOS, which a chat turn does not end with. For vLLM, pass stop=["<|im_end|>"].

Research checkpoint from a from-scratch nanochat-d24 replication (pretrain → midtrain → SFT → RL) on NERSC Perlmutter. Trained on third-party corpora (ClimbMix, FineMath, OpenMath, MetaMath, OpenThoughts, OLMo-3 Dolmino, SmolTalk, …) — see those datasets' licenses; provided as-is for research.

d24-sft-v1base-mathheavy-3.7B

Get help setting up a custom Dedicated Endpoints.

README

Use (chat)

Explore FriendliAI today

d24-sft-v1base-mathheavy-3.7B