naazimsnh02

fraudsentinel-qwen3-14b-lora

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Capabilities

The model is trained to act as an enterprise fraud and AML investigation assistant across six task types:

  • Structured JSON risk scoring — calibrated risk score (0.0–1.0), risk level (LOW / MEDIUM / HIGH / CRITICAL), typology, key signals, feature importance, recommended action, and SAR rationale
  • Explainable alerts — evidence-grounded investigator-facing natural language explanations tied to actual transaction features
  • Typology classification — primary and secondary fraud/laundering pattern identification (card-not-present, account takeover, fan-out, gather-scatter, structuring, etc.)
  • 6-level recommended actionAUTO_APPROVE → APPROVE_WITH_MONITORING → STEP_UP_AUTH → TEMPORARY_HOLD → AUTO_BLOCK → SAR_REVIEW
  • SAR drafting — FinCEN-aligned Suspicious Activity Report narrative generation for human review and filing
  • Multi-turn HITL dialogue — investigator follow-ups ("Why this risk level?", "What else should I check?", "Customer confirmed legit — what next?")
  • Deep Analysis mode — optional Chain-of-Thought reasoning via Qwen3's thinking tokens for complex multi-account cases

Training Details

Table
PropertyValue
Base modelunsloth/Qwen3-14B (Apache-2.0)
MethodSupervised Fine-Tuning (SFT) + LoRA
LoRA rank16
LoRA alpha32
Target modulesq_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj (all-linear)
LoRA dropout0 (Unsloth-optimized)
Trainable parameters64,225,280 (0.433% of 14.83B total)
Datasetnaazimsnh02/fraud-financial-crime-qwen3-sft-v2
Training examples11,016 (train split)
Epochs2
Total steps1,378
Batch size (per device)2
Gradient accumulation8 (effective batch size 16)
Learning rate1e-4
LR schedulerCosine
Warmup ratio0.05
OptimizerAdamW 8-bit
Precisionbfloat16 (no quantization)
Weight decay0.001
Max sequence length4,096
PackingDisabled (padding-free mode enabled)
HardwareAMD MI300X (192 GB VRAM)
FrameworkUnsloth 2026.6.1, TRL 0.22.2, PEFT 0.19.1, Transformers 4.56.2
ROCm / PyTorchROCm 7.0, PyTorch 2.10.0+rocm7.0
Train loss (final)0.2467
Training time4,230 s (70.5 min)
Peak VRAM39.8 GB (20.8% of 192 GB)
LoRA VRAM overhead12.0 GB (6.3% of max)

Usage

python

from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "naazimsnh02/fraudsentinel-qwen3-14b-lora",
max_seq_length = 4096,
dtype = torch.bfloat16,
load_in_4bit = False,
)
FastLanguageModel.for_inference(model)

Load with PEFT + Transformers

python

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-14B",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, "naazimsnh02/fraudsentinel-qwen3-14b-lora")
tokenizer = AutoTokenizer.from_pretrained("naazimsnh02/fraudsentinel-qwen3-14b-lora")

Inference Example

python

messages = [
{"role": "system", "content": "You are FraudSentinel, an expert fraud detection and AML investigation assistant."},
{"role": "user", "content": (
"Analyze this card transaction and return a structured JSON risk assessment.\n\n"
"Transaction: amount=$828.62, category=misc_net, hour=2, "
"amount_vs_category_p95=2.16x, tx_24h=4, geo_km=1847, is_fraud=True"
)},
]
# Thinking mode OFF (fast mode — default for Tier-2 triage)
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.1,
do_sample=True,
)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Deep Analysis mode (Chain-of-Thought for complex cases):

python

text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True, # activates Qwen3 thinking tokens
)

Output Schema (Structured Task)

json

{
"risk_score": 0.84,
"risk_level": "HIGH",
"conclusion": "FRAUDULENT",
"primary_typology": "card-not-present account takeover / stolen-card online cash-out",
"secondary_typology": "account_takeover",
"key_signals": [
"amount_exceeds_category_p95",
"high_risk_merchant_category",
"unusual_hour_activity"
],
"explanation": "Transaction amount $828.62 exceeds the 95th-percentile for misc_net purchases...",
"feature_importance": {
"amount_exceeds_category_p95": 0.46,
"high_risk_merchant_category": 0.28,
"unusual_hour_activity": 0.26
},
"recommended_action": "AUTO_BLOCK",
"sar_required": false,
"sar_rationale": null
}

Limitations

  • Prototype/research use. Source data is synthetic/semi-synthetic. Do not use for real customer adjudication without independent validation, bias review, and human-in-the-loop controls.
  • AI-generated SAR drafts require human review and edit before filing.
  • The model was trained with thinking mode OFF (enable_thinking=False). Enabling thinking mode at inference activates Qwen3's CoT capabilities but adds latency (3–5 s per response).
  • Feature importance values are deterministic heuristics from the training data generation pipeline, not SHAP or model-derived importances.

License

Apache-2.0 (base model and adapter).

Model provider

naazimsnh02

Model tree

Base

Qwen/Qwen3-14B

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today