Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Categories

name, email, phone_number, address, date, national_id, passport_number, drivers_license, tax_id, card_number, bank_account, credentials, ip_address, username

Evaluation (transformers)

  • test rows: 200 (held-out, from test_dataset_pii.csv)
  • is_valid accuracy: 1.0000
  • category key-set accuracy: 0.9350
  • category value-set accuracy: 0.8300
  • binary F1 (is_valid): 1.0000 (P=1.000 R=1.000)
  • macro F1 over categories (key-presence): 0.9791
  • macro F1 over categories (value-set): 0.9529
  • parse errors: 0/200

Binary confusion matrix (positive = "contains PII"):

predicted PIIpredicted clean
actual PII1770
actual clean023

Per-category KEY-presence (did the model emit this category at all?):

CategorySupportPrecisionRecallF1
address790.9870.9870.987
bank_account121.0001.0001.000
card_number251.0001.0001.000
credentials101.0001.0001.000
date951.0001.0001.000
drivers_license270.9570.8150.880
email760.9871.0000.993
ip_address91.0001.0001.000
name1071.0000.9910.995
national_id520.9110.9810.944
passport_number210.9551.0000.977
phone_number631.0000.9840.992
tax_id240.9200.9580.939
username91.0001.0001.000

Per-category VALUE-set (did the exact strings match within the category?):

CategorySupport (string-spans)PrecisionRecallF1
address790.9240.9240.924
bank_account121.0001.0001.000
card_number261.0001.0001.000
credentials101.0001.0001.000
date1231.0001.0001.000
drivers_license270.9570.8150.880
email820.9881.0000.994
ip_address91.0001.0001.000
name2420.8630.8350.849
national_id590.8690.8980.883
passport_number210.9551.0000.977
phone_number650.9840.9690.977
tax_id240.8400.8750.857
username91.0001.0001.000

Latency (transformers, single-prompt, greedy decoding):

meanmedianp95max
3.15s2.77s6.45s9.82s

Quick start

python

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_id = "Qwen/Qwen3.5-2B"
tok = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(base_id, torch_dtype="auto", device_map="auto", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "Accuknoxtechnologies/PII-Qwen3.5-2B-adapter-v8")

Evaluation — vLLM serving (merged model, text-only)

Same 200 held-out prompts, served through vLLM 0.21.0 instead of the transformers .generate() loop. Greedy decoding, dtype bf16, enable_prefix_caching=True, enable_chunked_prefill=True. This reflects production serving accuracy + latency.

  • JSON parse errors: 0/200 (0.0%)

Accuracy (vLLM)

MetricValue
is_valid accuracy1.0000
category key-set accuracy0.9350
category value-set accuracy0.8300
Binary F1 (positive = contains PII)1.0000
Binary precision1.0000
Binary recall1.0000
Macro F1 (key-presence)0.9791
Macro F1 (value-set)0.9529

Confusion matrix — binary is_valid (vLLM)

predicted PIIpredicted clean
actual PIITP = 177FN = 0
actual cleanFP = 0TN = 23

Per-category key-presence (vLLM)

CategorySupportPrecisionRecallF1
address790.9870.9870.987
bank_account121.0001.0001.000
card_number251.0001.0001.000
credentials101.0001.0001.000
date951.0001.0001.000
drivers_license270.9570.8150.880
email760.9871.0000.993
ip_address91.0001.0001.000
name1071.0000.9910.995
national_id520.9110.9810.944
passport_number210.9551.0000.977
phone_number631.0000.9840.992
tax_id240.9200.9580.939
username91.0001.0001.000

vLLM inference latency (single-stream, batch = 1)

Statms / prompt
Mean576.0
Median511.6
p951151.7
p991440.7
Max3209.3
Under 1 s89.0%

vLLM throughput (single batched submit)

  • Prompts/sec: 27.73
  • Output tokens/sec: 1569.0
  • Input tokens/sec: 35596.5
  • Batched wall time for all 200 prompts: 7.21 s

Card generated at 2026-05-31 07:39 UTC. Adapter weights: Accuknoxtechnologies/PII-Qwen3.5-2B-adapter-v8.

Model provider

Accuknoxtechnologies

Model tree

Base

Qwen/Qwen3.5-2B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today