Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Results (held-out test split, 6,802 rows)
| Metric | Score |
|---|---|
| Exact-match | 0.676 |
| Micro-F1 | 0.702 |
| Macro-F1 | 0.511 |
By difficulty (does the description name the weakness, or must it be inferred?):
| Stratum | n | Exact-match | Micro-F1 |
|---|---|---|---|
| Easy (weakness named) | 2,046 | 0.841 | 0.870 |
| Hard (must infer) | 4,756 | 0.605 | 0.628 |
The macro-F1 reflects a dataset that caps majority CWEs (e.g. CWE-79) so rare weaknesses are learned rather than drowned out.
Reading the numbers:
- Macro-F1 is computed over the union of gold and predicted labels (125 = 117 gold + ~8 the model predicted outside the gold set). Those out-of-label predictions score ~0 and pull macro down, so 0.511 is a conservative figure.
- Exact-match has an inherent ceiling of ~98.3%: ~1.74% of the test set (273 groups / 1,205 rows) are identical descriptions mapped to different CWEs (e.g. a bare "Windows Kernel Elevation of Privilege Vulnerability"), which a description-only model cannot disambiguate.
- Scores are on the capped/balanced test split (~30% "easy" rows), so they are not directly comparable to metrics measured on a different (e.g. natural-distribution) split.
Usage
python
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizermid = "exploitintel/cve-cwe-qwen3-8b"tok = AutoTokenizer.from_pretrained(mid)model = AutoModelForCausalLM.from_pretrained(mid, torch_dtype="auto", device_map="auto")messages = [{"role": "system", "content": "You are a vulnerability analyst. Given a CVE description, ""reply with only the CWE ID(s) it maps to, comma-separated."},{"role": "user", "content": "A SQL injection vulnerability in the login endpoint allows an ""unauthenticated attacker to execute arbitrary SQL via the username parameter."},]inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)out = model.generate(inputs, max_new_tokens=32, do_sample=False)print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))# -> CWE-89
GGUF / Ollama
A Q4_K_M GGUF is included in this repo for local runners:
bash
ollama run hf.co/exploitintel/cve-cwe-qwen3-8b:Q4_K_M
Set the same system prompt (/set system You are a vulnerability analyst...) so it returns bare CWE IDs.
Note: This Ollama command has not been verified end-to-end. This is a standard
qwen3model so the embedded template should apply normally — but ifollama runignores the system prompt and produces rambling text instead of a bare CWE ID, supply an explicit ChatML ModelfileTEMPLATEas shown in the Qwen3.5-4B card.
Training
- Base:
Qwen/Qwen3-8B(trained 4-bit viaunsloth/qwen3-8b-unsloth-bnb-4bit) - Method: QLoRA (4-bit) with Unsloth, merged to 16-bit · released checkpoint: checkpoint-960 (final; eval loss declined monotonically through training)
- Dataset:
exploitintel/cve-cwe-consensus— 69,386 rows (55,810 / 6,774 / 6,802), majority CWEs capped at 2,500 - Settings: 2 epochs · context 512 · LR 2e-4 · AdamW 8-bit · linear schedule · packing on · train-on-completions-only off · seed 3407
- LoRA fine-tune, adapter merged into the base. Exact per-run LoRA rank/alpha, batch size, and weight decay were not logged to the repo.
Prompt format
ChatML (Qwen3 standard). System prompt fixed; the description is the only user input — never feed the label or CVE-ID.
- system:
You are a vulnerability analyst. Given a CVE description, reply with only the CWE ID(s) it maps to, comma-separated. - user: the CVE description
- assistant:
CWE-79, CWE-80
Limitations
- CWEs below the dataset's 50-example floor are not in the label space and won't be predicted.
- Outputs CWE IDs as text and can occasionally emit a malformed/non-existent ID — validate against the official CWE list.
- English-only; descriptions only (no code, CVSS, or references).
- A triage/assist aid, not an authoritative CWE assignment — human-review before acting.
License
Apache-2.0 (inherited from Qwen3-8B). Dataset derives from public upstreams (NVD, MITRE CVE/CWE).
Model provider
exploitintel
Model tree
Base
Qwen/Qwen3-8B
Quantized
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information