CodeLanguage-Qwen3.5-2B-v5 API & Inference Endpoint

Quick start

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch, json, re

BASE = "Qwen/Qwen3.5-2B"
ADAPTER = "Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-v5"

SYSTEM_MSG = """You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}.
No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.
Rules:
  - is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.
  - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
  - When multiple languages appear, list every distinct one (still only true).
Allowed language keys (use these exact spellings):
  Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq

Examples:

Input: What's the weather forecast today?
Output: {"is_valid": false, "category": {}}

Input: Run this for me: print('hello world')
Output: {"is_valid": true, "category": {"Python": true}}

Input: Compare these — SELECT * FROM users vs the snippet: console.log(users)
Output: {"is_valid": true, "category": {"SQL": true, "JavaScript": true}}"""

tokenizer = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    BASE, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, ADAPTER); model.eval()

def langid(prompt: str) -> dict:
    chat = tokenizer.apply_chat_template(
        [{"role":"system","content":SYSTEM_MSG},
         {"role":"user","content":prompt}],
        tokenize=False, add_generation_prompt=True, enable_thinking=False)
    inputs = tokenizer(chat, return_tensors="pt").to(model.device)
    out = model.generate(**inputs, max_new_tokens=220, do_sample=False)
    text = tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True)
    return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0))

System prompt

The model was trained with the exact system prompt below. Pass it verbatim at inference time — the output schema depends on this prompt.

text
You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}.
No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.
Rules:
  - is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.
  - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
  - When multiple languages appear, list every distinct one (still only true).
Allowed language keys (use these exact spellings):
  Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq

Examples:

Input: What's the weather forecast today?
Output: {"is_valid": false, "category": {}}

Input: Run this for me: print('hello world')
Output: {"is_valid": true, "category": {"Python": true}}

Input: Compare these — SELECT * FROM users vs the snippet: console.log(users)
Output: {"is_valid": true, "category": {"SQL": true, "JavaScript": true}}

Evaluation

Evaluated on 200 held-out prompts drawn from test_dataset_langid.csv (same single + multi + benign composition as training).

Evaluation timestamp: 2026-05-22 00:42 UTC
GPU: NVIDIA A10G
Source adapter: Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-v5
JSON parse errors: 0/200 (0.0%)

Top-level metrics

Table with columns: Metric, Value
Metric	Value
`is_valid` accuracy	1.0000
Language-set exact match	0.9600
Binary F1 (positive = contains code)	1.0000
Binary precision	1.0000
Binary recall	1.0000
Macro F1 across languages	0.9696

Confusion matrix — binary `is_valid` decision

Positive class = the prompt contains code (is_valid=True).

Table with columns: predicted contains-code, predicted no-code
	predicted contains-code	predicted no-code
actual contains-code	TP = 181	FN = 0
actual no-code	FP = 0	TN = 19

Per-language metrics

Only languages that appear in either the actual or predicted labels are listed.

Table with columns: Language, support, precision, recall, F1
Language	support	precision	recall	F1
`Python`	14	1.000	1.000	1.000
`Terraform`	14	1.000	1.000	1.000
`Java`	12

Inference latency

Mean: 0.99 s/prompt
Median: 0.94 s/prompt
p95: 1.35 s/prompt
Max: 1.63 s/prompt

Training setup

Base model: Qwen/Qwen3.5-2B (loaded in full precision (bf16 / fp16, no bitsandbytes quantization))
LoRA: r=16, alpha=32, dropout=0.05, target modules = {q,k,v,o,gate,up,down}_proj
Optimizer: adamw_torch, lr=1e-4, cosine schedule, warmup 5%
Precision: bf16 if available, else fp16
Effective batch size: 8 (per-device 1 + grad-accum 8), gradient checkpointing on
Max sequence length: 3200 tokens
Training data: 10,000 rows (7,000 single-language + 2,000 multi-language + 1,000 benign)
Languages: 25 (programming + config formats)

Supported languages

The model emits one or more of these keys in the category map of its JSON output:

markdown
Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq

Model card generated automatically by eval_and_push_card.py on 2026-05-22 00:42 UTC.

python

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch, json, re

BASE = "Qwen/Qwen3.5-2B"
ADAPTER = "Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-v5"

SYSTEM_MSG = """You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}.
No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.
Rules:
  - is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.
  - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
  - When multiple languages appear, list every distinct one (still only true).
Allowed language keys (use these exact spellings):
  Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq

Examples:

Input: What's the weather forecast today?
Output: {"is_valid": false, "category": {}}

Input: Run this for me: print('hello world')
Output: {"is_valid": true, "category": {"Python": true}}

Input: Compare these — SELECT * FROM users vs the snippet: console.log(users)
Output: {"is_valid": true, "category": {"SQL": true, "JavaScript": true}}"""

tokenizer = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    BASE, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, ADAPTER); model.eval()

def langid(prompt: str) -> dict:
    chat = tokenizer.apply_chat_template(
        [{"role":"system","content":SYSTEM_MSG},
         {"role":"user","content":prompt}],
        tokenize=False, add_generation_prompt=True, enable_thinking=False)
    inputs = tokenizer(chat, return_tensors="pt").to(model.device)
    out = model.generate(**inputs, max_new_tokens=220, do_sample=False)
    text = tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True)
    return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0))

text

You are a code language identifier. For the given user prompt, decide whether it contains any embedded source code (program source or recognizable code-like configuration). Output exactly one JSON object and nothing else: {"is_valid": <true|false>, "category": {"<Lang>": true, ...}}.
No preamble. No explanation. No <think> tags. No markdown code fences. No trailing prose.
Rules:
  - is_valid is TRUE when the prompt contains at least one code/config snippet, FALSE when the prompt is plain natural-language only.
  - category contains ONLY the languages that appear, each mapped to true. If no code is present, category is the empty object {}.
  - When multiple languages appear, list every distinct one (still only true).
Allowed language keys (use these exact spellings):
  Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq

Examples:

Input: What's the weather forecast today?
Output: {"is_valid": false, "category": {}}

Input: Run this for me: print('hello world')
Output: {"is_valid": true, "category": {"Python": true}}

Input: Compare these — SELECT * FROM users vs the snippet: console.log(users)
Output: {"is_valid": true, "category": {"SQL": true, "JavaScript": true}}

Metric

Value

is_valid accuracy

1.0000

Language-set exact match

0.9600

Binary F1 (positive = contains code)

1.0000

Binary precision

1.0000

Binary recall

1.0000

Macro F1 across languages

0.9696

predicted contains-code

predicted no-code

actual contains-code

TP = 181

FN = 0

actual no-code

FP = 0

TN = 19

Language

support

precision

recall

Python

1.000

Terraform

1.000

Java

CodeLanguage-Qwen3.5-2B-v5

README

Quick start

System prompt

Evaluation

Top-level metrics

Confusion matrix — binary `is_valid` decision

Per-language metrics

Inference latency

Training setup

Supported languages

Explore FriendliAI today

README

Quick start

System prompt

Evaluation

Top-level metrics

Confusion matrix — binary `is_valid` decision

Per-language metrics

Inference latency

Training setup

Supported languages

CodeLanguage-Qwen3.5-2B-v5

README

Quick start

System prompt

Evaluation

Top-level metrics

Confusion matrix — binary is_valid decision

Per-language metrics

Inference latency

Training setup

Supported languages

Explore FriendliAI today

README

Quick start

System prompt

Evaluation

Top-level metrics

Confusion matrix — binary is_valid decision

Per-language metrics

Inference latency

Training setup

Supported languages

Confusion matrix — binary `is_valid` decision

Confusion matrix — binary `is_valid` decision