Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

What it is

  • Architecture: Nemotron-H (hybrid Mamba-2 / Transformer), 4B params, BF16
  • Source LoRA: build-small-hackathon/noir-verdict-nemotron-4b-lora
  • Merge method: save_pretrained_merged(..., save_method="merged_16bit") (Unsloth)
  • Trust remote code: yes (Nemotron 3 hybrid uses custom modeling code)

How to use

python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "build-small-hackathon/noir-verdict-nemotron-4b-merged"
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo, torch_dtype=torch.bfloat16, trust_remote_code=True,
).cuda().eval()

Chat template

The chat template is the Nemotron 3 chat template, with enable_thinking=False baked in. The system prompt for an active interrogation is built by engine/prompts.py:build_system_prompt(...).

python

messages = [
{"role": "system", "content": "You are Greta Lindholm, junior continuity writer at WJBK. ..."},
{"role": "user", "content": "Where were you at the time of the theft?"},
]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)

Inference tips

  • n_ctx ≥ 4096
  • temperature 0.6–0.7, top_p 0.9–0.95
  • max_new_tokens 180–280 per turn
  • Stop on <|im_end|>

How it was built

  • Image: nvidia/cuda:12.8.1-devel-ubuntu22.04 + Python 3.13
  • Fine-tune: Unsloth LoRA on A10G, 240 steps, Nemotron 3 Nano 4B
  • Merge: model.save_pretrained_merged(..., save_method="merged_16bit") in the same Modal job
  • Orchestrator: train/modal_finetune.py

Companion artifacts

License

Apache-2.0. The base Nemotron 3 Nano weights are governed by NVIDIA's model license; the merged checkpoint and training code in this repo are Apache-2.0.

Model provider

build-small-hackathon

Model tree

Base

unsloth/NVIDIA-Nemotron-3-Nano-4B

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today