exploitintel

cve-cwe-qwen3-32b

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Results (held-out test split, 6,802 rows)

Table with columns: Metric, This model (32B), 8B variant
Metric	This model (32B)	8B variant
Exact-match	0.707	0.676
Micro-F1	0.729	0.702
Macro-F1	0.595	0.511

By difficulty (does the description name the weakness, or must it be inferred?):

Table with columns: Stratum, n, Exact-match, Micro-F1
Stratum	n	Exact-match	Micro-F1
Easy (weakness named)	2,046	0.871	0.893
Hard (must infer)	4,756	0.636	0.657

Both models are scored identically; the 32B's gains are largest on macro-F1 (rare/long-tail CWEs) and the hard inference split.

Reading the numbers:

Macro-F1 is over the union of gold and predicted labels (118 = 117 gold + ~1 the model predicted outside the gold set), so 0.595 is a conservative figure. The low out-of-label count also means the model rarely hallucinates non-existent CWEs.
Exact-match has an inherent ceiling of ~98.3%: ~1.74% of the test set (273 groups / 1,205 rows) are identical descriptions mapped to different CWEs (e.g. a bare "Windows Kernel Elevation of Privilege Vulnerability"), which a description-only model cannot disambiguate.
Scores are on the capped/balanced test split (~30% "easy" rows), so they are not directly comparable to metrics measured on a different (e.g. natural-distribution) split.

Usage

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

mid = "exploitintel/cve-cwe-qwen3-32b"
tok = AutoTokenizer.from_pretrained(mid)
model = AutoModelForCausalLM.from_pretrained(mid, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "system", "content": "You are a vulnerability analyst. Given a CVE description, "
                                   "reply with only the CWE ID(s) it maps to, comma-separated."},
    {"role": "user", "content": "A SQL injection vulnerability in the login endpoint allows an "
                                "unauthenticated attacker to execute arbitrary SQL via the username parameter."},
]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=32, do_sample=False)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
# -> CWE-89

GGUF / Ollama

A Q4_K_M GGUF (~20 GB) is included in this repo for local runners — needs ~24 GB VRAM:

bash
ollama run hf.co/exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Set the same system prompt (/set system You are a vulnerability analyst...) so it returns bare CWE IDs.

Note: This Ollama command has not been verified end-to-end. This is a standard qwen3 model so the embedded template should apply normally — but if ollama run ignores the system prompt and produces rambling text instead of a bare CWE ID, supply an explicit ChatML Modelfile TEMPLATE as shown in the Qwen3.5-4B card.

Training

Base: Qwen/Qwen3-32B (trained 4-bit via unsloth/Qwen3-32B-unsloth-bnb-4bit)
Method: QLoRA (4-bit) with Unsloth, merged to 16-bit · released checkpoint: checkpoint-960 (final; eval loss declined monotonically through training)
Dataset: exploitintel/cve-cwe-consensus — 69,386 rows (55,810 / 6,774 / 6,802), majority CWEs capped at 2,500
Settings: 2 epochs · context 512 · LR 2e-4 · AdamW 8-bit · linear schedule · packing on · train-on-completions-only off · seed 3407
LoRA fine-tune, rank 16 (confirmed); adapter merged into the base. Exact LoRA alpha, batch size, and weight decay were not logged to the repo.

Prompt format

ChatML (Qwen3 standard). Fixed system prompt; the description is the only user input.

system: You are a vulnerability analyst. Given a CVE description, reply with only the CWE ID(s) it maps to, comma-separated.
user: the CVE description
assistant: CWE-79, CWE-80

Limitations

CWEs below the dataset's 50-example floor are not in the label space and won't be predicted.
Outputs CWE IDs as text; validate against the official CWE list.
English-only; descriptions only (no code, CVSS, or references).
A triage/assist aid, not an authoritative CWE assignment — human-review before acting.

License

Apache-2.0 (inherited from Qwen3-32B). Dataset derives from public upstreams (NVD, MITRE CVE/CWE).

Model provider

exploitintel

Model tree

Base

Qwen/Qwen3-32B

Quantized

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Results (held-out test split, 6,802 rows)

Table with columns: Metric, This model (32B), 8B variant
Metric	This model (32B)	8B variant
Exact-match	0.707	0.676
Micro-F1	0.729	0.702
Macro-F1	0.595	0.511

By difficulty (does the description name the weakness, or must it be inferred?):

Table with columns: Stratum, n, Exact-match, Micro-F1
Stratum	n	Exact-match	Micro-F1
Easy (weakness named)	2,046	0.871	0.893
Hard (must infer)	4,756	0.636	0.657

Both models are scored identically; the 32B's gains are largest on macro-F1 (rare/long-tail CWEs) and the hard inference split.

Reading the numbers:

Macro-F1 is over the union of gold and predicted labels (118 = 117 gold + ~1 the model predicted outside the gold set), so 0.595 is a conservative figure. The low out-of-label count also means the model rarely hallucinates non-existent CWEs.
Exact-match has an inherent ceiling of ~98.3%: ~1.74% of the test set (273 groups / 1,205 rows) are identical descriptions mapped to different CWEs (e.g. a bare "Windows Kernel Elevation of Privilege Vulnerability"), which a description-only model cannot disambiguate.
Scores are on the capped/balanced test split (~30% "easy" rows), so they are not directly comparable to metrics measured on a different (e.g. natural-distribution) split.

Usage

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

mid = "exploitintel/cve-cwe-qwen3-32b"
tok = AutoTokenizer.from_pretrained(mid)
model = AutoModelForCausalLM.from_pretrained(mid, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "system", "content": "You are a vulnerability analyst. Given a CVE description, "
                                   "reply with only the CWE ID(s) it maps to, comma-separated."},
    {"role": "user", "content": "A SQL injection vulnerability in the login endpoint allows an "
                                "unauthenticated attacker to execute arbitrary SQL via the username parameter."},
]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=32, do_sample=False)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
# -> CWE-89

GGUF / Ollama

A Q4_K_M GGUF (~20 GB) is included in this repo for local runners — needs ~24 GB VRAM:

bash
ollama run hf.co/exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Set the same system prompt (/set system You are a vulnerability analyst...) so it returns bare CWE IDs.

Note: This Ollama command has not been verified end-to-end. This is a standard qwen3 model so the embedded template should apply normally — but if ollama run ignores the system prompt and produces rambling text instead of a bare CWE ID, supply an explicit ChatML Modelfile TEMPLATE as shown in the Qwen3.5-4B card.

Training

Base: Qwen/Qwen3-32B (trained 4-bit via unsloth/Qwen3-32B-unsloth-bnb-4bit)
Method: QLoRA (4-bit) with Unsloth, merged to 16-bit · released checkpoint: checkpoint-960 (final; eval loss declined monotonically through training)
Dataset: exploitintel/cve-cwe-consensus — 69,386 rows (55,810 / 6,774 / 6,802), majority CWEs capped at 2,500
Settings: 2 epochs · context 512 · LR 2e-4 · AdamW 8-bit · linear schedule · packing on · train-on-completions-only off · seed 3407
LoRA fine-tune, rank 16 (confirmed); adapter merged into the base. Exact LoRA alpha, batch size, and weight decay were not logged to the repo.

Prompt format

ChatML (Qwen3 standard). Fixed system prompt; the description is the only user input.

system: You are a vulnerability analyst. Given a CVE description, reply with only the CWE ID(s) it maps to, comma-separated.
user: the CVE description
assistant: CWE-79, CWE-80

Limitations

CWEs below the dataset's 50-example floor are not in the label space and won't be predicted.
Outputs CWE IDs as text; validate against the official CWE list.
English-only; descriptions only (no code, CVSS, or references).
A triage/assist aid, not an authoritative CWE assignment — human-review before acting.

License

Apache-2.0 (inherited from Qwen3-32B). Dataset derives from public upstreams (NVD, MITRE CVE/CWE).

cve-cwe-qwen3-32b

Get help setting up a custom Dedicated Endpoints.

README

Results (held-out test split, 6,802 rows)

Usage

GGUF / Ollama

Training

Prompt format

Limitations

License

Explore FriendliAI today

README

Results (held-out test split, 6,802 rows)

Usage

GGUF / Ollama

Training

Prompt format

Limitations

License