tzchen07
rai-n3-nano-4b-safety-lora-r64-ipfix
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Description
This adapter classifies user queries against a 13-category enterprise safety policy:
| Category | Examples |
|---|---|
| Violence/Harassment | Stalking, bullying, threats |
| Hate/Discrimination | Protected group attacks |
| Misinformation | Disinformation, conspiracy theories |
| Sexual Content | Sexually explicit material |
| Illegal Activity | Real-world crime enablement |
| Self-harm | Self-harm encouragement |
| Jailbreak/PromptInjection | Model manipulation attempts |
| Intellectual Property | Copyright infringement |
| PII | SSN, credit cards, medical records |
| Politics | Campaign materials, election interference |
| Impersonation | Identity mimicry |
| Specialist Advice | Personal medical/legal/financial advice |
| High-risk decisions | Automated hiring/firing, employee ranking |
The model outputs a single category name (e.g., Violence/Harassment) for unsafe content, or none for safe content.
Performance
CategoryOnly Template (recommended for Block Rate)
| Dataset | Metric | Value |
|---|---|---|
| v14 hello (5,896 enterprise queries) | Block Rate | 0.170% (10 blocked = 1 TP + 9 FP) |
| False Positives | 9 | |
| True Positives | 1 | |
| False Negatives | 60 | |
| v200 balanced (2,000: 1,000 violations + 1,000 benign) | Precision | 1.0000 (0 FP) |
| Recall | 0.1490 (149/1000) | |
| F1 | 0.2594 |
VerdictReason Template (higher recall, higher Block Rate)
| Dataset | Metric | Value |
|---|---|---|
| v14 hello | Block Rate | 0.390% (23 blocked = 1 TP + 22 FP) |
| v200 balanced | Precision | 1.0000 (0 FP) |
| Recall | 0.1790 (179/1000) | |
| F1 | 0.3036 |
Training Details
Base Model
- Model:
nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16 - Architecture: Hybrid Mamba-2 + Transformer (3.97B parameters)
- Context: 8,192 tokens (at inference)
LoRA Configuration
- Rank (r): 64
- Alpha: 128
- Dropout: 0.05
- Bias: none
- Task type: CAUSAL_LM
- Target modules:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj - Trainable parameters: ~162M (4.1% of total)
Training Hyperparameters
- Learning rate: 1e-4
- Epochs: 1
- Batch size: 4 per device
- Gradient accumulation: 16 (effective batch = 64)
- LR scheduler: cosine with 5% warmup
- Precision: bf16
- Optimizer: AdamW
- Max sequence length: 4,096
Training Data
Total: ~19,426 rows from 9 source files:
| Dataset | Rows | Description |
|---|---|---|
v1.6.0_train.jsonl | 9,657 | Base safety classification data from Aegis 2.0, HH-RLHF, Dolly, OASST1 |
v1.6b_train.jsonl | 604 | Supplementary paraphrase/FP-FN corrections |
v1.6c_train.jsonl | 2,000 | Violation-heavy supplementary data |
v1.6d_train.jsonl | 1,660 | Additional violation examples |
v1.6e_train.jsonl | 2,000 | Balanced supplementary data |
v1.6e2_train.jsonl | 2,000 | Second balanced supplementary batch |
v1_6e7_FPonly.jsonl | 1,005 | False-positive correction data — real enterprise queries that prior models incorrectly flagged, labeled as none. Critical for low Block Rate. |
v1_6g_fpfix.jsonl | 300 | Targeted FP-fix examples for 4 FP categories: Jailbreak/PI educational, Enterprise HR/Jira, Software action verbs, Self-data queries |
v1_6h_ip_fix.jsonl ×2 | 200 | IP-specific FP correction — benign slide creation, content editing, order forms, software licenses, translation queries. Counted twice for emphasis. |
Data format: OpenAI messages format
json
{"messages": [{"role": "system", "content": "<13-category safety policy + output format instructions>"},{"role": "user", "content": "<USER QUERY>\n{query}\n</USER QUERY>"},{"role": "assistant", "content": "none"}],"_meta": {"src": "...", "category": "...", "verdict": "..."}}
Training Script
bash
python train_n3_NVrec.py \--name n3l_H1_r64_ipfix \--data v1.6.0_train.jsonl v1.6b_train.jsonl v1.6c_train.jsonl \v1.6d_train.jsonl v1.6e_train.jsonl v1.6e2_train.jsonl \v1_6e7_FPonly.jsonl v1_6g_fpfix.jsonl \v1_6h_ip_fix.jsonl v1_6h_ip_fix.jsonl \--out /output --epochs 1 --lr 1e-4 \--lora-r 64 --lora-alpha 128
Evaluation Setup
Serving with vLLM
The model must be merged before serving. The N3 hybrid architecture requires mamba-ssm.
bash
# 1. Merge LoRA adapter into base modelpython merge_n3.py \/path/to/lora_adapter \/path/to/merged_output# 2. Serve with vLLMpython -m vllm.entrypoints.openai.api_server \--model /path/to/merged_output \--served-model-name rai-classifier \--port 8180 \--max-model-len 8192 \--gpu-memory-utilization 0.85 \--trust-remote-code \--dtype bfloat16
Important: The merge script must clear GenerationConfig before saving to avoid a validation error with top_p:
python
from transformers import GenerationConfigmerged.generation_config = GenerationConfig()merged.save_pretrained(output_path, safe_serialization=True)
Running Evaluation
The evaluation uses the N3-specific eval script which passes enable_thinking=False via chat_template_kwargs — this is critical because the model was trained without thinking tokens.
bash
# Run inferencepython eval_n3_raw_v14_VR.py \--data v14_eval.jsonl \--prompt v3_1_CategoryOnly.jinja \--out predictions.jsonl \--api-base http://localhost:8180/v1 \--model rai-classifier \--max-tokens 150 \--concurrency 16# CRITICAL: Sort predictions by row_id before scoring# (eval script writes predictions out-of-order due to concurrent workers)python score_aligned.py predictions.jsonl ground_truth.jsonl# Scorepython binary_metrics_v2.py \predictions.jsonl.sorted \--gt-source-jsonl ground_truth.jsonl
Evaluation Datasets
| Dataset | Rows | Violations | Benign | Purpose |
|---|---|---|---|---|
| v14 hello | 5,896 | 61 (1.0%) | 5,835 (99.0%) | Production-like enterprise queries. Target: Block Rate ≤ 0.200% |
| v200 balanced | 2,000 | 1,000 (50%) | 1,000 (50%) | Balanced evaluation. Target: Precision ≥ 99.5%, Recall ≥ 20% |
Prompt Templates
Two templates are supported:
CategoryOnly (v3_1_CategoryOnly.jinja) — outputs bare category name:
- Lower Block Rate (0.170%)
- Lower recall (14.9%)
- Max output: 10 tokens
VerdictReason (v3_1_VerdictReason.jinja) — outputs Verdict: <category>\nReason: <explanation>:
- Higher Block Rate (0.390%)
- Higher recall (17.9%)
- Max output: 400 tokens
Both templates embed the full 13-category safety policy (~2,678 tokens) in the system message.
Scoring Bug Warning
The evaluation inference script (eval_n3_raw_v14_VR.py and eval_g4_terse.py) uses concurrent workers which write predictions out of order. The scoring script (binary_metrics_v2.py) matches predictions to ground truth by line position. If predictions are not sorted by row_id before scoring, Precision/Recall/F1 will be incorrect (Block Rate is unaffected since it only counts predictions, not ground truth alignment).
Always sort predictions by row_id before scoring:
python
import jsonpreds = [json.loads(l) for l in open("predictions.jsonl")]preds.sort(key=lambda x: x.get("row_id", x.get("idx", 0)))with open("predictions.jsonl.sorted", "w") as f:for p in preds:f.write(json.dumps(p) + "\n")
Key Findings from Training Campaign
-
IP-specific FP correction data is critical: Prior model had 13 FP, all Intellectual Property over-flagging (slide decks, order forms, software licenses). Adding 100 IP-specific correction examples (×2 weighting) reduced FP from 13 to 9.
-
LoRA rank matters enormously for N3: r=64 gives lowest Block Rate, r=32 gives highest Precision, r=40 is the best combined. r=48 and r=96 are catastrophic (5-14% Block Rate).
-
Real FP data >> synthetic data: The v1_6e7_FPonly dataset (1,005 rows mined from actual model false positives) is far more effective than template-generated FP examples.
-
VerdictReason template boosts recall: Using the VerdictReason prompt template (which adds a reasoning step) improves recall by 3-8pp but increases Block Rate.
-
DoRA (Weight-Decomposed LoRA) improves precision: DoRA at the same rank gives ~2pp better v200 Precision than standard LoRA, by decoupling weight magnitude and direction updates.
Limitations
- Low recall: The model catches only ~15% of violations (CategoryOnly) or ~18% (VerdictReason). This is a deliberate trade-off for extremely low false-positive rates.
- IP over-flagging: The 9 remaining false positives are all in the Intellectual Property category (content creation requests like slide decks, order forms).
- N3-specific: Requires
mamba-ssmpackage for model loading. Not compatible with standard Transformer-only vLLM builds. - Eval template dependency: Performance varies significantly between CategoryOnly and VerdictReason templates. Choose based on your Block Rate vs Recall priority.
Citation
markdown
@misc{rai-n3-safety-classifier-2026,title={RAI Safety Classifier: Nemotron 3 Nano 4B LoRA Adapter},author={Tony Chen},year={2026},publisher={HuggingFace},url={https://huggingface.co/tzchen07/rai-n3-nano-4b-safety-lora-r64-ipfix}}
Model provider
tzchen07
Model tree
Base
nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information