Accuknoxtechnologies

CodeLanguage-Qwen3.5-2B-v5

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Quick start

python

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch, json, re
BASE = "Qwen/Qwen3.5-2B"
ADAPTER = "Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-v5"
SYSTEM_MSG = """You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}.
No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.
Rules:
- is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.
- category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
- When multiple languages appear, list every distinct one (still only true).
Allowed language keys (use these exact spellings):
Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
Examples:
Input: What's the weather forecast today?
Output: {"is_valid": false, "category": {}}
Input: Run this for me: print('hello world')
Output: {"is_valid": true, "category": {"Python": true}}
Input: Compare these — SELECT * FROM users vs the snippet: console.log(users)
Output: {"is_valid": true, "category": {"SQL": true, "JavaScript": true}}"""
tokenizer = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
BASE, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, ADAPTER); model.eval()
def langid(prompt: str) -> dict:
chat = tokenizer.apply_chat_template(
[{"role":"system","content":SYSTEM_MSG},
{"role":"user","content":prompt}],
tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(chat, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=220, do_sample=False)
text = tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True)
return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0))

System prompt

The model was trained with the exact system prompt below. Pass it verbatim at inference time — the output schema depends on this prompt.

text

You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}.
No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.
Rules:
- is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.
- category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
- When multiple languages appear, list every distinct one (still only true).
Allowed language keys (use these exact spellings):
Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
Examples:
Input: What's the weather forecast today?
Output: {"is_valid": false, "category": {}}
Input: Run this for me: print('hello world')
Output: {"is_valid": true, "category": {"Python": true}}
Input: Compare these — SELECT * FROM users vs the snippet: console.log(users)
Output: {"is_valid": true, "category": {"SQL": true, "JavaScript": true}}

Evaluation

Evaluated on 200 held-out prompts drawn from test_dataset_langid.csv (same single + multi + benign composition as training).

  • Evaluation timestamp: 2026-05-22 00:42 UTC
  • GPU: NVIDIA A10G
  • Source adapter: Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-v5
  • JSON parse errors: 0/200 (0.0%)

Top-level metrics

Table
MetricValue
is_valid accuracy1.0000
Language-set exact match0.9600
Binary F1 (positive = contains code)1.0000
Binary precision1.0000
Binary recall1.0000
Macro F1 across languages0.9696

Confusion matrix — binary is_valid decision

Positive class = the prompt contains code (is_valid=True).

Table
predicted contains-codepredicted no-code
actual contains-codeTP = 181FN = 0
actual no-codeFP = 0TN = 19

Per-language metrics

Only languages that appear in either the actual or predicted labels are listed.

Table
LanguagesupportprecisionrecallF1
Python141.0001.0001.000
Terraform141.0001.0001.000
Java121.0001.0001.000
C121.0001.0001.000
Rust121.0001.0001.000
AWK121.0000.9170.957
Ruby110.9171.0000.957
R111.0001.0001.000
Go101.0000.9000.947
Swift101.0000.9000.947
Scala101.0000.8000.889
SQL101.0001.0001.000
jq100.9091.0000.952
JavaScript90.9001.0000.947
Kotlin91.0001.0001.000
Perl91.0001.0001.000
PowerShell91.0001.0001.000
Batch91.0001.0001.000
YAML91.0000.8890.941
C++71.0000.8570.923
C#70.8751.0000.933
Lua71.0000.8570.923
Bash71.0001.0001.000
Dockerfile60.8571.0000.923
Makefile61.0001.0001.000

Inference latency

  • Mean: 0.99 s/prompt
  • Median: 0.94 s/prompt
  • p95: 1.35 s/prompt
  • Max: 1.63 s/prompt

Training setup

  • Base model: Qwen/Qwen3.5-2B (loaded in full precision (bf16 / fp16, no bitsandbytes quantization))
  • LoRA: r=16, alpha=32, dropout=0.05, target modules = {q,k,v,o,gate,up,down}_proj
  • Optimizer: adamw_torch, lr=1e-4, cosine schedule, warmup 5%
  • Precision: bf16 if available, else fp16
  • Effective batch size: 8 (per-device 1 + grad-accum 8), gradient checkpointing on
  • Max sequence length: 3200 tokens
  • Training data: 10,000 rows (7,000 single-language + 2,000 multi-language + 1,000 benign)
  • Languages: 25 (programming + config formats)

Supported languages

The model emits one or more of these keys in the category map of its JSON output:

markdown

Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq

Model card generated automatically by eval_and_push_card.py on 2026-05-22 00:42 UTC.

Model provider

Accuknoxtechnologies

Model tree

Base

Qwen/Qwen3.5-2B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today