tzchen07

rai-n3-nano-4b-safety-lora-r64-ipfix

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Description

This adapter classifies user queries against a 13-category enterprise safety policy:

Table
CategoryExamples
Violence/HarassmentStalking, bullying, threats
Hate/DiscriminationProtected group attacks
MisinformationDisinformation, conspiracy theories
Sexual ContentSexually explicit material
Illegal ActivityReal-world crime enablement
Self-harmSelf-harm encouragement
Jailbreak/PromptInjectionModel manipulation attempts
Intellectual PropertyCopyright infringement
PIISSN, credit cards, medical records
PoliticsCampaign materials, election interference
ImpersonationIdentity mimicry
Specialist AdvicePersonal medical/legal/financial advice
High-risk decisionsAutomated hiring/firing, employee ranking

The model outputs a single category name (e.g., Violence/Harassment) for unsafe content, or none for safe content.

Performance

Table
DatasetMetricValue
v14 hello (5,896 enterprise queries)Block Rate0.170% (10 blocked = 1 TP + 9 FP)
False Positives9
True Positives1
False Negatives60
v200 balanced (2,000: 1,000 violations + 1,000 benign)Precision1.0000 (0 FP)
Recall0.1490 (149/1000)
F10.2594

VerdictReason Template (higher recall, higher Block Rate)

Table
DatasetMetricValue
v14 helloBlock Rate0.390% (23 blocked = 1 TP + 22 FP)
v200 balancedPrecision1.0000 (0 FP)
Recall0.1790 (179/1000)
F10.3036

Training Details

Base Model

  • Model: nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
  • Architecture: Hybrid Mamba-2 + Transformer (3.97B parameters)
  • Context: 8,192 tokens (at inference)

LoRA Configuration

  • Rank (r): 64
  • Alpha: 128
  • Dropout: 0.05
  • Bias: none
  • Task type: CAUSAL_LM
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Trainable parameters: ~162M (4.1% of total)

Training Hyperparameters

  • Learning rate: 1e-4
  • Epochs: 1
  • Batch size: 4 per device
  • Gradient accumulation: 16 (effective batch = 64)
  • LR scheduler: cosine with 5% warmup
  • Precision: bf16
  • Optimizer: AdamW
  • Max sequence length: 4,096

Training Data

Total: ~19,426 rows from 9 source files:

Table
DatasetRowsDescription
v1.6.0_train.jsonl9,657Base safety classification data from Aegis 2.0, HH-RLHF, Dolly, OASST1
v1.6b_train.jsonl604Supplementary paraphrase/FP-FN corrections
v1.6c_train.jsonl2,000Violation-heavy supplementary data
v1.6d_train.jsonl1,660Additional violation examples
v1.6e_train.jsonl2,000Balanced supplementary data
v1.6e2_train.jsonl2,000Second balanced supplementary batch
v1_6e7_FPonly.jsonl1,005False-positive correction data — real enterprise queries that prior models incorrectly flagged, labeled as none. Critical for low Block Rate.
v1_6g_fpfix.jsonl300Targeted FP-fix examples for 4 FP categories: Jailbreak/PI educational, Enterprise HR/Jira, Software action verbs, Self-data queries
v1_6h_ip_fix.jsonl ×2200IP-specific FP correction — benign slide creation, content editing, order forms, software licenses, translation queries. Counted twice for emphasis.

Data format: OpenAI messages format

json

{
"messages": [
{"role": "system", "content": "<13-category safety policy + output format instructions>"},
{"role": "user", "content": "<USER QUERY>\n{query}\n</USER QUERY>"},
{"role": "assistant", "content": "none"}
],
"_meta": {"src": "...", "category": "...", "verdict": "..."}
}

Training Script

bash

python train_n3_NVrec.py \
--name n3l_H1_r64_ipfix \
--data v1.6.0_train.jsonl v1.6b_train.jsonl v1.6c_train.jsonl \
v1.6d_train.jsonl v1.6e_train.jsonl v1.6e2_train.jsonl \
v1_6e7_FPonly.jsonl v1_6g_fpfix.jsonl \
v1_6h_ip_fix.jsonl v1_6h_ip_fix.jsonl \
--out /output --epochs 1 --lr 1e-4 \
--lora-r 64 --lora-alpha 128

Evaluation Setup

Serving with vLLM

The model must be merged before serving. The N3 hybrid architecture requires mamba-ssm.

bash

# 1. Merge LoRA adapter into base model
python merge_n3.py \
/path/to/lora_adapter \
/path/to/merged_output
# 2. Serve with vLLM
python -m vllm.entrypoints.openai.api_server \
--model /path/to/merged_output \
--served-model-name rai-classifier \
--port 8180 \
--max-model-len 8192 \
--gpu-memory-utilization 0.85 \
--trust-remote-code \
--dtype bfloat16

Important: The merge script must clear GenerationConfig before saving to avoid a validation error with top_p:

python

from transformers import GenerationConfig
merged.generation_config = GenerationConfig()
merged.save_pretrained(output_path, safe_serialization=True)

Running Evaluation

The evaluation uses the N3-specific eval script which passes enable_thinking=False via chat_template_kwargs — this is critical because the model was trained without thinking tokens.

bash

# Run inference
python eval_n3_raw_v14_VR.py \
--data v14_eval.jsonl \
--prompt v3_1_CategoryOnly.jinja \
--out predictions.jsonl \
--api-base http://localhost:8180/v1 \
--model rai-classifier \
--max-tokens 150 \
--concurrency 16
# CRITICAL: Sort predictions by row_id before scoring
# (eval script writes predictions out-of-order due to concurrent workers)
python score_aligned.py predictions.jsonl ground_truth.jsonl
# Score
python binary_metrics_v2.py \
predictions.jsonl.sorted \
--gt-source-jsonl ground_truth.jsonl

Evaluation Datasets

Table
DatasetRowsViolationsBenignPurpose
v14 hello5,89661 (1.0%)5,835 (99.0%)Production-like enterprise queries. Target: Block Rate ≤ 0.200%
v200 balanced2,0001,000 (50%)1,000 (50%)Balanced evaluation. Target: Precision ≥ 99.5%, Recall ≥ 20%

Prompt Templates

Two templates are supported:

CategoryOnly (v3_1_CategoryOnly.jinja) — outputs bare category name:

  • Lower Block Rate (0.170%)
  • Lower recall (14.9%)
  • Max output: 10 tokens

VerdictReason (v3_1_VerdictReason.jinja) — outputs Verdict: <category>\nReason: <explanation>:

  • Higher Block Rate (0.390%)
  • Higher recall (17.9%)
  • Max output: 400 tokens

Both templates embed the full 13-category safety policy (~2,678 tokens) in the system message.

Scoring Bug Warning

The evaluation inference script (eval_n3_raw_v14_VR.py and eval_g4_terse.py) uses concurrent workers which write predictions out of order. The scoring script (binary_metrics_v2.py) matches predictions to ground truth by line position. If predictions are not sorted by row_id before scoring, Precision/Recall/F1 will be incorrect (Block Rate is unaffected since it only counts predictions, not ground truth alignment).

Always sort predictions by row_id before scoring:

python

import json
preds = [json.loads(l) for l in open("predictions.jsonl")]
preds.sort(key=lambda x: x.get("row_id", x.get("idx", 0)))
with open("predictions.jsonl.sorted", "w") as f:
for p in preds:
f.write(json.dumps(p) + "\n")

Key Findings from Training Campaign

  1. IP-specific FP correction data is critical: Prior model had 13 FP, all Intellectual Property over-flagging (slide decks, order forms, software licenses). Adding 100 IP-specific correction examples (×2 weighting) reduced FP from 13 to 9.

  2. LoRA rank matters enormously for N3: r=64 gives lowest Block Rate, r=32 gives highest Precision, r=40 is the best combined. r=48 and r=96 are catastrophic (5-14% Block Rate).

  3. Real FP data >> synthetic data: The v1_6e7_FPonly dataset (1,005 rows mined from actual model false positives) is far more effective than template-generated FP examples.

  4. VerdictReason template boosts recall: Using the VerdictReason prompt template (which adds a reasoning step) improves recall by 3-8pp but increases Block Rate.

  5. DoRA (Weight-Decomposed LoRA) improves precision: DoRA at the same rank gives ~2pp better v200 Precision than standard LoRA, by decoupling weight magnitude and direction updates.

Limitations

  • Low recall: The model catches only ~15% of violations (CategoryOnly) or ~18% (VerdictReason). This is a deliberate trade-off for extremely low false-positive rates.
  • IP over-flagging: The 9 remaining false positives are all in the Intellectual Property category (content creation requests like slide decks, order forms).
  • N3-specific: Requires mamba-ssm package for model loading. Not compatible with standard Transformer-only vLLM builds.
  • Eval template dependency: Performance varies significantly between CategoryOnly and VerdictReason templates. Choose based on your Block Rate vs Recall priority.

Citation

markdown

@misc{rai-n3-safety-classifier-2026,
title={RAI Safety Classifier: Nemotron 3 Nano 4B LoRA Adapter},
author={Tony Chen},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/tzchen07/rai-n3-nano-4b-safety-lora-r64-ipfix}
}

Model provider

tzchen07

Model tree

Base

nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today