naazimsnh02
fraudsentinel-qwen3-14b-lora
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Capabilities
The model is trained to act as an enterprise fraud and AML investigation assistant across six task types:
- Structured JSON risk scoring — calibrated risk score (0.0–1.0), risk level (LOW / MEDIUM / HIGH / CRITICAL), typology, key signals, feature importance, recommended action, and SAR rationale
- Explainable alerts — evidence-grounded investigator-facing natural language explanations tied to actual transaction features
- Typology classification — primary and secondary fraud/laundering pattern identification (card-not-present, account takeover, fan-out, gather-scatter, structuring, etc.)
- 6-level recommended action —
AUTO_APPROVE → APPROVE_WITH_MONITORING → STEP_UP_AUTH → TEMPORARY_HOLD → AUTO_BLOCK → SAR_REVIEW - SAR drafting — FinCEN-aligned Suspicious Activity Report narrative generation for human review and filing
- Multi-turn HITL dialogue — investigator follow-ups ("Why this risk level?", "What else should I check?", "Customer confirmed legit — what next?")
- Deep Analysis mode — optional Chain-of-Thought reasoning via Qwen3's thinking tokens for complex multi-account cases
Training Details
| Property | Value |
|---|---|
| Base model | unsloth/Qwen3-14B (Apache-2.0) |
| Method | Supervised Fine-Tuning (SFT) + LoRA |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj (all-linear) |
| LoRA dropout | 0 (Unsloth-optimized) |
| Trainable parameters | 64,225,280 (0.433% of 14.83B total) |
| Dataset | naazimsnh02/fraud-financial-crime-qwen3-sft-v2 |
| Training examples | 11,016 (train split) |
| Epochs | 2 |
| Total steps | 1,378 |
| Batch size (per device) | 2 |
| Gradient accumulation | 8 (effective batch size 16) |
| Learning rate | 1e-4 |
| LR scheduler | Cosine |
| Warmup ratio | 0.05 |
| Optimizer | AdamW 8-bit |
| Precision | bfloat16 (no quantization) |
| Weight decay | 0.001 |
| Max sequence length | 4,096 |
| Packing | Disabled (padding-free mode enabled) |
| Hardware | AMD MI300X (192 GB VRAM) |
| Framework | Unsloth 2026.6.1, TRL 0.22.2, PEFT 0.19.1, Transformers 4.56.2 |
| ROCm / PyTorch | ROCm 7.0, PyTorch 2.10.0+rocm7.0 |
| Train loss (final) | 0.2467 |
| Training time | 4,230 s (70.5 min) |
| Peak VRAM | 39.8 GB (20.8% of 192 GB) |
| LoRA VRAM overhead | 12.0 GB (6.3% of max) |
Usage
Load with Unsloth (recommended)
python
from unsloth import FastLanguageModelimport torchmodel, tokenizer = FastLanguageModel.from_pretrained(model_name = "naazimsnh02/fraudsentinel-qwen3-14b-lora",max_seq_length = 4096,dtype = torch.bfloat16,load_in_4bit = False,)FastLanguageModel.for_inference(model)
Load with PEFT + Transformers
python
from transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModelimport torchbase = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-14B",torch_dtype=torch.bfloat16,device_map="auto",)model = PeftModel.from_pretrained(base, "naazimsnh02/fraudsentinel-qwen3-14b-lora")tokenizer = AutoTokenizer.from_pretrained("naazimsnh02/fraudsentinel-qwen3-14b-lora")
Inference Example
python
messages = [{"role": "system", "content": "You are FraudSentinel, an expert fraud detection and AML investigation assistant."},{"role": "user", "content": ("Analyze this card transaction and return a structured JSON risk assessment.\n\n""Transaction: amount=$828.62, category=misc_net, hour=2, ""amount_vs_category_p95=2.16x, tx_24h=4, geo_km=1847, is_fraud=True")},]# Thinking mode OFF (fast mode — default for Tier-2 triage)text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True,enable_thinking=False,)inputs = tokenizer(text, return_tensors="pt").to(model.device)with torch.no_grad():output = model.generate(**inputs,max_new_tokens=512,temperature=0.1,do_sample=True,)print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Deep Analysis mode (Chain-of-Thought for complex cases):
python
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True,enable_thinking=True, # activates Qwen3 thinking tokens)
Output Schema (Structured Task)
json
{"risk_score": 0.84,"risk_level": "HIGH","conclusion": "FRAUDULENT","primary_typology": "card-not-present account takeover / stolen-card online cash-out","secondary_typology": "account_takeover","key_signals": ["amount_exceeds_category_p95","high_risk_merchant_category","unusual_hour_activity"],"explanation": "Transaction amount $828.62 exceeds the 95th-percentile for misc_net purchases...","feature_importance": {"amount_exceeds_category_p95": 0.46,"high_risk_merchant_category": 0.28,"unusual_hour_activity": 0.26},"recommended_action": "AUTO_BLOCK","sar_required": false,"sar_rationale": null}
Limitations
- Prototype/research use. Source data is synthetic/semi-synthetic. Do not use for real customer adjudication without independent validation, bias review, and human-in-the-loop controls.
- AI-generated SAR drafts require human review and edit before filing.
- The model was trained with thinking mode OFF (
enable_thinking=False). Enabling thinking mode at inference activates Qwen3's CoT capabilities but adds latency (3–5 s per response). - Feature importance values are deterministic heuristics from the training data generation pipeline, not SHAP or model-derived importances.
License
Apache-2.0 (base model and adapter).
Model provider
naazimsnh02
Model tree
Base
Qwen/Qwen3-14B
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information