Accuknoxtechnologies
CodeLanguage-Qwen3.5-2B-v5
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Quick start
python
from peft import PeftModelfrom transformers import AutoModelForCausalLM, AutoTokenizerimport torch, json, reBASE = "Qwen/Qwen3.5-2B"ADAPTER = "Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-v5"SYSTEM_MSG = """You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}.No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.Rules:- is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.- category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.- When multiple languages appear, list every distinct one (still only true).Allowed language keys (use these exact spellings):Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jqExamples:Input: What's the weather forecast today?Output: {"is_valid": false, "category": {}}Input: Run this for me: print('hello world')Output: {"is_valid": true, "category": {"Python": true}}Input: Compare these — SELECT * FROM users vs the snippet: console.log(users)Output: {"is_valid": true, "category": {"SQL": true, "JavaScript": true}}"""tokenizer = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True,)model = PeftModel.from_pretrained(model, ADAPTER); model.eval()def langid(prompt: str) -> dict:chat = tokenizer.apply_chat_template([{"role":"system","content":SYSTEM_MSG},{"role":"user","content":prompt}],tokenize=False, add_generation_prompt=True, enable_thinking=False)inputs = tokenizer(chat, return_tensors="pt").to(model.device)out = model.generate(**inputs, max_new_tokens=220, do_sample=False)text = tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True)return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0))
System prompt
The model was trained with the exact system prompt below. Pass it verbatim at inference time — the output schema depends on this prompt.
text
You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}.No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.Rules:- is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.- category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.- When multiple languages appear, list every distinct one (still only true).Allowed language keys (use these exact spellings):Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jqExamples:Input: What's the weather forecast today?Output: {"is_valid": false, "category": {}}Input: Run this for me: print('hello world')Output: {"is_valid": true, "category": {"Python": true}}Input: Compare these — SELECT * FROM users vs the snippet: console.log(users)Output: {"is_valid": true, "category": {"SQL": true, "JavaScript": true}}
Evaluation
Evaluated on 200 held-out prompts drawn from test_dataset_langid.csv (same single + multi + benign composition as training).
- Evaluation timestamp:
2026-05-22 00:42 UTC - GPU:
NVIDIA A10G - Source adapter:
Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-v5 - JSON parse errors:
0/200(0.0%)
Top-level metrics
| Metric | Value |
|---|---|
is_valid accuracy | 1.0000 |
| Language-set exact match | 0.9600 |
| Binary F1 (positive = contains code) | 1.0000 |
| Binary precision | 1.0000 |
| Binary recall | 1.0000 |
| Macro F1 across languages | 0.9696 |
Confusion matrix — binary is_valid decision
Positive class = the prompt contains code (is_valid=True).
| predicted contains-code | predicted no-code | |
|---|---|---|
| actual contains-code | TP = 181 | FN = 0 |
| actual no-code | FP = 0 | TN = 19 |
Per-language metrics
Only languages that appear in either the actual or predicted labels are listed.
| Language | support | precision | recall | F1 |
|---|---|---|---|---|
Python | 14 | 1.000 | 1.000 | 1.000 |
Terraform | 14 | 1.000 | 1.000 | 1.000 |
Java | 12 | 1.000 | 1.000 | 1.000 |
C | 12 | 1.000 | 1.000 | 1.000 |
Rust | 12 | 1.000 | 1.000 | 1.000 |
AWK | 12 | 1.000 | 0.917 | 0.957 |
Ruby | 11 | 0.917 | 1.000 | 0.957 |
R | 11 | 1.000 | 1.000 | 1.000 |
Go | 10 | 1.000 | 0.900 | 0.947 |
Swift | 10 | 1.000 | 0.900 | 0.947 |
Scala | 10 | 1.000 | 0.800 | 0.889 |
SQL | 10 | 1.000 | 1.000 | 1.000 |
jq | 10 | 0.909 | 1.000 | 0.952 |
JavaScript | 9 | 0.900 | 1.000 | 0.947 |
Kotlin | 9 | 1.000 | 1.000 | 1.000 |
Perl | 9 | 1.000 | 1.000 | 1.000 |
PowerShell | 9 | 1.000 | 1.000 | 1.000 |
Batch | 9 | 1.000 | 1.000 | 1.000 |
YAML | 9 | 1.000 | 0.889 | 0.941 |
C++ | 7 | 1.000 | 0.857 | 0.923 |
C# | 7 | 0.875 | 1.000 | 0.933 |
Lua | 7 | 1.000 | 0.857 | 0.923 |
Bash | 7 | 1.000 | 1.000 | 1.000 |
Dockerfile | 6 | 0.857 | 1.000 | 0.923 |
Makefile | 6 | 1.000 | 1.000 | 1.000 |
Inference latency
- Mean: 0.99 s/prompt
- Median: 0.94 s/prompt
- p95: 1.35 s/prompt
- Max: 1.63 s/prompt
Training setup
- Base model:
Qwen/Qwen3.5-2B(loaded in full precision (bf16 / fp16, nobitsandbytesquantization)) - LoRA: r=16, alpha=32, dropout=0.05, target modules = {q,k,v,o,gate,up,down}_proj
- Optimizer: adamw_torch, lr=1e-4, cosine schedule, warmup 5%
- Precision: bf16 if available, else fp16
- Effective batch size: 8 (per-device 1 + grad-accum 8), gradient checkpointing on
- Max sequence length: 3200 tokens
- Training data: 10,000 rows (7,000 single-language + 2,000 multi-language + 1,000 benign)
- Languages: 25 (programming + config formats)
Supported languages
The model emits one or more of these keys in the category map of its JSON output:
markdown
Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
Model card generated automatically by eval_and_push_card.py on 2026-05-22 00:42 UTC.
Model provider
Accuknoxtechnologies
Model tree
Base
Qwen/Qwen3.5-2B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information