exploitintel

cve-cwe-qwen35-9b

Task

Input: a CVE description (plain text).
Output: one or more CWE IDs, comma-separated and numerically sorted (e.g. CWE-79 or CWE-79, CWE-352).

Evaluation

On 100 held-out test rows (strict exact-match, including multi-label order):

Table with columns: Variant, Exact-match, Well-formed, <think> leak
Variant	Exact-match	Well-formed	`<think>` leak
Merged 16-bit (Transformers)	75.0%	100%	0
Q4_K_M GGUF (Ollama)	70.0%	100%	0

Strict exact-match penalizes near-misses (e.g. correct primary CWE plus one extra label), so practical usefulness is higher than the headline number. The ~5-point gap is the expected Q4 quantization cost.

Usage (Transformers / Unsloth)

[!IMPORTANT] Disable thinking mode. This model is trained for terse, structured output. Run with enable_thinking=False; otherwise Qwen3.5's default <think> block pollutes the output. Import unsloth before transformers so the qwen3_5 architecture is registered.

python
import unsloth  # registers qwen3_5; must come first
from unsloth import FastModel

model, tok = FastModel.from_pretrained("exploitintel/cve-cwe-qwen35-9b", load_in_4bit=False)
FastModel.for_inference(model)
ttok = getattr(tok, "tokenizer", tok)

SYSTEM = ("You are a vulnerability analyst. Given a CVE description, "
          "reply with only the CWE ID(s) it maps to, comma-separated.")
msgs = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "A reflected cross-site scripting issue lets a remote "
                                 "attacker inject arbitrary script via the q parameter."},
]
text = ttok.apply_chat_template(msgs, add_generation_prompt=True,
                               enable_thinking=False, tokenize=False)
ids = ttok(text, return_tensors="pt", add_special_tokens=False).to(model.device)
out = model.generate(**ids, max_new_tokens=48, do_sample=False)
print(ttok.decode(out[0][ids["input_ids"].shape[-1]:], skip_special_tokens=True))  # -> CWE-79

Usage (Ollama / GGUF)

A Q4_K_M GGUF is included in this repo. It is converted without the MTP head (convert_hf_to_gguf.py --no-mtp) — required, or llama.cpp/Ollama fails to load with qwen3next: layer 32 missing attn_qkv/attn_gate projections. The bundled Modelfile pins thinking-mode off and the correct stop token (<|im_end|>) so output is a clean CWE-....

bash
ollama run hf.co/exploitintel/cve-cwe-qwen35-9b:Q4_K_M

Notes

Architecture: qwen3_5 (hybrid linear/full attention + MTP). Requires Unsloth or a transformers build that registers qwen3_5 (≥ 5.2.0).
Base modality: the base is vision-capable; this fine-tune and the GGUF target text-only CVE→CWE mapping.
License: inherits the license of the Qwen3.5-9B base model.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

exploitintel

Model Tree

Base

unsloth/Qwen3.5-9B

Quantized

this model

Input Modalities

Text

Image

Video

Output Modalities

Text

Supported Functionality

Dedicated Endpoints

Explore FriendliAI today

Get started Talk to an engineer

Task

Input: a CVE description (plain text).
Output: one or more CWE IDs, comma-separated and numerically sorted (e.g. CWE-79 or CWE-79, CWE-352).

Evaluation

On 100 held-out test rows (strict exact-match, including multi-label order):

Table with columns: Variant, Exact-match, Well-formed, <think> leak
Variant	Exact-match	Well-formed	`<think>` leak
Merged 16-bit (Transformers)	75.0%	100%	0
Q4_K_M GGUF (Ollama)	70.0%	100%	0

Usage (Transformers / Unsloth)

[!IMPORTANT] Disable thinking mode. This model is trained for terse, structured output. Run with enable_thinking=False; otherwise Qwen3.5's default <think> block pollutes the output. Import unsloth before transformers so the qwen3_5 architecture is registered.

python
import unsloth  # registers qwen3_5; must come first
from unsloth import FastModel

model, tok = FastModel.from_pretrained("exploitintel/cve-cwe-qwen35-9b", load_in_4bit=False)
FastModel.for_inference(model)
ttok = getattr(tok, "tokenizer", tok)

SYSTEM = ("You are a vulnerability analyst. Given a CVE description, "
          "reply with only the CWE ID(s) it maps to, comma-separated.")
msgs = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "A reflected cross-site scripting issue lets a remote "
                                 "attacker inject arbitrary script via the q parameter."},
]
text = ttok.apply_chat_template(msgs, add_generation_prompt=True,
                               enable_thinking=False, tokenize=False)
ids = ttok(text, return_tensors="pt", add_special_tokens=False).to(model.device)
out = model.generate(**ids, max_new_tokens=48, do_sample=False)
print(ttok.decode(out[0][ids["input_ids"].shape[-1]:], skip_special_tokens=True))  # -> CWE-79

Usage (Ollama / GGUF)

bash
ollama run hf.co/exploitintel/cve-cwe-qwen35-9b:Q4_K_M

Notes

Architecture: qwen3_5 (hybrid linear/full attention + MTP). Requires Unsloth or a transformers build that registers qwen3_5 (≥ 5.2.0).
Base modality: the base is vision-capable; this fine-tune and the GGUF target text-only CVE→CWE mapping.
License: inherits the license of the Qwen3.5-9B base model.

cve-cwe-qwen35-9b

README

Task

Evaluation

Usage (Transformers / Unsloth)

Usage (Ollama / GGUF)

Notes

Explore FriendliAI today

README

Task

Evaluation

Usage (Transformers / Unsloth)

Usage (Ollama / GGUF)

Notes