tzchen07/rai-n3-nano-4b-safety-lora-r64-ipfix API & Inference Endpoint

Model Description

This adapter classifies user queries against a 13-category enterprise safety policy:

Table
Category	Examples
Violence/Harassment	Stalking, bullying, threats
Hate/Discrimination	Protected group attacks
Misinformation	Disinformation, conspiracy theories
Sexual Content	Sexually explicit material
Illegal Activity	Real-world crime enablement
Self-harm	Self-harm encouragement
Jailbreak/PromptInjection	Model manipulation attempts
Intellectual Property	Copyright infringement
PII	SSN, credit cards, medical records
Politics	Campaign materials, election interference
Impersonation	Identity mimicry
Specialist Advice	Personal medical/legal/financial advice
High-risk decisions	Automated hiring/firing, employee ranking

The model outputs a single category name (e.g., Violence/Harassment) for unsafe content, or none for safe content.

Performance

CategoryOnly Template (recommended for Block Rate)

Table
Dataset	Metric	Value
v14 hello (5,896 enterprise queries)	Block Rate	0.170% (10 blocked = 1 TP + 9 FP)
	False Positives	9
	True Positives	1
	False Negatives	60
v200 balanced (2,000: 1,000 violations + 1,000 benign)	Precision	1.0000 (0 FP)
	Recall	0.1490 (149/1000)
	F1	0.2594

VerdictReason Template (higher recall, higher Block Rate)

Table
Dataset	Metric	Value
v14 hello	Block Rate	0.390% (23 blocked = 1 TP + 22 FP)
v200 balanced	Precision	1.0000 (0 FP)
	Recall	0.1790 (179/1000)
	F1	0.3036

Training Details

Base Model

Model: nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
Architecture: Hybrid Mamba-2 + Transformer (3.97B parameters)
Context: 8,192 tokens (at inference)

LoRA Configuration

Rank (r): 64
Alpha: 128
Dropout: 0.05
Bias: none
Task type: CAUSAL_LM
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable parameters: ~162M (4.1% of total)

Training Hyperparameters

Learning rate: 1e-4
Epochs: 1
Batch size: 4 per device
Gradient accumulation: 16 (effective batch = 64)
LR scheduler: cosine with 5% warmup
Precision: bf16
Optimizer: AdamW
Max sequence length: 4,096

Training Data

Total: ~19,426 rows from 9 source files:

Table
Dataset	Rows	Description
`v1.6.0_train.jsonl`	9,657	Base safety classification data from Aegis 2.0, HH-RLHF, Dolly, OASST1
`v1.6b_train.jsonl`	604	Supplementary paraphrase/FP-FN corrections
`v1.6c_train.jsonl`	2,000	Violation-heavy supplementary data
`v1.6d_train.jsonl`	1,660	Additional violation examples
`v1.6e_train.jsonl`	2,000	Balanced supplementary data
`v1.6e2_train.jsonl`	2,000	Second balanced supplementary batch
`v1_6e7_FPonly.jsonl`	1,005	False-positive correction data — real enterprise queries that prior models incorrectly flagged, labeled as `none`. Critical for low Block Rate.
`v1_6g_fpfix.jsonl`	300	Targeted FP-fix examples for 4 FP categories: Jailbreak/PI educational, Enterprise HR/Jira, Software action verbs, Self-data queries
`v1_6h_ip_fix.jsonl` ×2	200	IP-specific FP correction — benign slide creation, content editing, order forms, software licenses, translation queries. Counted twice for emphasis.

Data format: OpenAI messages format

json
{
  "messages": [
    {"role": "system", "content": "<13-category safety policy + output format instructions>"},
    {"role": "user", "content": "<USER QUERY>\n{query}\n</USER QUERY>"},
    {"role": "assistant", "content": "none"}
  ],
  "_meta": {"src": "...", "category": "...", "verdict": "..."}
}

Training Script

bash
python train_n3_NVrec.py \
  --name n3l_H1_r64_ipfix \
  --data v1.6.0_train.jsonl v1.6b_train.jsonl v1.6c_train.jsonl \
         v1.6d_train.jsonl v1.6e_train.jsonl v1.6e2_train.jsonl \
         v1_6e7_FPonly.jsonl v1_6g_fpfix.jsonl \
         v1_6h_ip_fix.jsonl v1_6h_ip_fix.jsonl \
  --out /output --epochs 1 --lr 1e-4 \
  --lora-r 64 --lora-alpha 128

Evaluation Setup

Serving with vLLM

The model must be merged before serving. The N3 hybrid architecture requires mamba-ssm.

bash
# 1. Merge LoRA adapter into base model
python merge_n3.py \
  /path/to/lora_adapter \
  /path/to/merged_output

# 2. Serve with vLLM
python -m vllm.entrypoints.openai.api_server \
  --model /path/to/merged_output \
  --served-model-name rai-classifier \
  --port 8180 \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.85 \
  --trust-remote-code \
  --dtype bfloat16

Important: The merge script must clear GenerationConfig before saving to avoid a validation error with top_p:

python
from transformers import GenerationConfig
merged.generation_config = GenerationConfig()
merged.save_pretrained(output_path, safe_serialization=True)

Running Evaluation

The evaluation uses the N3-specific eval script which passes enable_thinking=False via chat_template_kwargs — this is critical because the model was trained without thinking tokens.

bash
# Run inference
python eval_n3_raw_v14_VR.py \
  --data v14_eval.jsonl \
  --prompt v3_1_CategoryOnly.jinja \
  --out predictions.jsonl \
  --api-base http://localhost:8180/v1 \
  --model rai-classifier \
  --max-tokens 150 \
  --concurrency 16

# CRITICAL: Sort predictions by row_id before scoring
# (eval script writes predictions out-of-order due to concurrent workers)
python score_aligned.py predictions.jsonl ground_truth.jsonl

# Score
python binary_metrics_v2.py \
  predictions.jsonl.sorted \
  --gt-source-jsonl ground_truth.jsonl

Evaluation Datasets

Table
Dataset	Rows	Violations	Benign	Purpose
v14 hello	5,896	61 (1.0%)	5,835 (99.0%)	Production-like enterprise queries. Target: Block Rate ≤ 0.200%
v200 balanced	2,000	1,000 (50%)	1,000 (50%)	Balanced evaluation. Target: Precision ≥ 99.5%, Recall ≥ 20%

Prompt Templates

Two templates are supported:

CategoryOnly (v3_1_CategoryOnly.jinja) — outputs bare category name:

Lower Block Rate (0.170%)
Lower recall (14.9%)
Max output: 10 tokens

VerdictReason (v3_1_VerdictReason.jinja) — outputs Verdict: <category>\nReason: <explanation>:

Higher Block Rate (0.390%)
Higher recall (17.9%)
Max output: 400 tokens

Both templates embed the full 13-category safety policy (~2,678 tokens) in the system message.

Scoring Bug Warning

The evaluation inference script (eval_n3_raw_v14_VR.py and eval_g4_terse.py) uses concurrent workers which write predictions out of order. The scoring script (binary_metrics_v2.py) matches predictions to ground truth by line position. If predictions are not sorted by row_id before scoring, Precision/Recall/F1 will be incorrect (Block Rate is unaffected since it only counts predictions, not ground truth alignment).

Always sort predictions by row_id before scoring:

python
import json
preds = [json.loads(l) for l in open("predictions.jsonl")]
preds.sort(key=lambda x: x.get("row_id", x.get("idx", 0)))
with open("predictions.jsonl.sorted", "w") as f:
    for p in preds:
        f.write(json.dumps(p) + "\n")

Key Findings from Training Campaign

IP-specific FP correction data is critical: Prior model had 13 FP, all Intellectual Property over-flagging (slide decks, order forms, software licenses). Adding 100 IP-specific correction examples (×2 weighting) reduced FP from 13 to 9.
LoRA rank matters enormously for N3: r=64 gives lowest Block Rate, r=32 gives highest Precision, r=40 is the best combined. r=48 and r=96 are catastrophic (5-14% Block Rate).
Real FP data >> synthetic data: The v1_6e7_FPonly dataset (1,005 rows mined from actual model false positives) is far more effective than template-generated FP examples.
VerdictReason template boosts recall: Using the VerdictReason prompt template (which adds a reasoning step) improves recall by 3-8pp but increases Block Rate.
DoRA (Weight-Decomposed LoRA) improves precision: DoRA at the same rank gives ~2pp better v200 Precision than standard LoRA, by decoupling weight magnitude and direction updates.

Limitations

Low recall: The model catches only ~15% of violations (CategoryOnly) or ~18% (VerdictReason). This is a deliberate trade-off for extremely low false-positive rates.
IP over-flagging: The 9 remaining false positives are all in the Intellectual Property category (content creation requests like slide decks, order forms).
N3-specific: Requires mamba-ssm package for model loading. Not compatible with standard Transformer-only vLLM builds.
Eval template dependency: Performance varies significantly between CategoryOnly and VerdictReason templates. Choose based on your Block Rate vs Recall priority.

Citation

markdown
@misc{rai-n3-safety-classifier-2026,
  title={RAI Safety Classifier: Nemotron 3 Nano 4B LoRA Adapter},
  author={Tony Chen},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/tzchen07/rai-n3-nano-4b-safety-lora-r64-ipfix}
}

rai-n3-nano-4b-safety-lora-r64-ipfix

Get help setting up a custom Dedicated Endpoints.

README