Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0What it is
- Architecture: Nemotron-H (hybrid Mamba-2 / Transformer), 4B params, BF16
- Source LoRA:
build-small-hackathon/noir-verdict-nemotron-4b-lora - Merge method:
save_pretrained_merged(..., save_method="merged_16bit")(Unsloth) - Trust remote code: yes (Nemotron 3 hybrid uses custom modeling code)
How to use
python
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizerrepo = "build-small-hackathon/noir-verdict-nemotron-4b-merged"tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype=torch.bfloat16, trust_remote_code=True,).cuda().eval()
Chat template
The chat template is the Nemotron 3 chat template, with
enable_thinking=False baked in. The system prompt for an active
interrogation is built by engine/prompts.py:build_system_prompt(...).
python
messages = [{"role": "system", "content": "You are Greta Lindholm, junior continuity writer at WJBK. ..."},{"role": "user", "content": "Where were you at the time of the theft?"},]text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
Inference tips
n_ctx≥ 4096temperature0.6–0.7,top_p0.9–0.95max_new_tokens180–280 per turn- Stop on
<|im_end|>
How it was built
- Image:
nvidia/cuda:12.8.1-devel-ubuntu22.04+ Python 3.13 - Fine-tune: Unsloth LoRA on A10G, 240 steps, Nemotron 3 Nano 4B
- Merge:
model.save_pretrained_merged(..., save_method="merged_16bit")in the same Modal job - Orchestrator:
train/modal_finetune.py
Companion artifacts
- LoRA:
build-small-hackathon/noir-verdict-nemotron-4b-lora(40.5 MB) - Q4_K_M GGUF:
build-small-hackathon/noir-verdict-nemotron-4b-gguf(2.84 GB) - App: build-small-hackathon/noir-verdict
License
Apache-2.0. The base Nemotron 3 Nano weights are governed by NVIDIA's model license; the merged checkpoint and training code in this repo are Apache-2.0.
Model provider
build-small-hackathon
Model tree
Base
unsloth/NVIDIA-Nemotron-3-Nano-4B
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information