Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Task
- Input: a CVE description (plain text).
- Output: one or more CWE IDs, comma-separated and numerically sorted (e.g.
CWE-79orCWE-79, CWE-352).
Evaluation
On 100 held-out test rows (strict exact-match, including multi-label order):
| Variant | Exact-match | Well-formed | <think> leak |
|---|---|---|---|
| Merged 16-bit (Transformers) | 75.0% | 100% | 0 |
| Q4_K_M GGUF (Ollama) | 70.0% | 100% | 0 |
Strict exact-match penalizes near-misses (e.g. correct primary CWE plus one extra label), so practical usefulness is higher than the headline number. The ~5-point gap is the expected Q4 quantization cost.
Usage (Transformers / Unsloth)
[!IMPORTANT] Disable thinking mode. This model is trained for terse, structured output. Run with
enable_thinking=False; otherwise Qwen3.5's default<think>block pollutes the output. Importunslothbeforetransformersso theqwen3_5architecture is registered.
python
import unsloth # registers qwen3_5; must come firstfrom unsloth import FastModelmodel, tok = FastModel.from_pretrained("exploitintel/cve-cwe-qwen35-9b", load_in_4bit=False)FastModel.for_inference(model)ttok = getattr(tok, "tokenizer", tok)SYSTEM = ("You are a vulnerability analyst. Given a CVE description, ""reply with only the CWE ID(s) it maps to, comma-separated.")msgs = [{"role": "system", "content": SYSTEM},{"role": "user", "content": "A reflected cross-site scripting issue lets a remote ""attacker inject arbitrary script via the q parameter."},]text = ttok.apply_chat_template(msgs, add_generation_prompt=True,enable_thinking=False, tokenize=False)ids = ttok(text, return_tensors="pt", add_special_tokens=False).to(model.device)out = model.generate(**ids, max_new_tokens=48, do_sample=False)print(ttok.decode(out[0][ids["input_ids"].shape[-1]:], skip_special_tokens=True)) # -> CWE-79
Usage (Ollama / GGUF)
A Q4_K_M GGUF is included in this repo. It is converted without the MTP head
(convert_hf_to_gguf.py --no-mtp) — required, or llama.cpp/Ollama fails to load with
qwen3next: layer 32 missing attn_qkv/attn_gate projections. The bundled Modelfile pins
thinking-mode off and the correct stop token (<|im_end|>) so output is a clean CWE-....
bash
ollama run hf.co/exploitintel/cve-cwe-qwen35-9b:Q4_K_M
Notes
- Architecture:
qwen3_5(hybrid linear/full attention + MTP). Requires Unsloth or atransformersbuild that registersqwen3_5(≥ 5.2.0). - Base modality: the base is vision-capable; this fine-tune and the GGUF target text-only CVE→CWE mapping.
- License: inherits the license of the
Qwen3.5-9Bbase model.
Model provider
exploitintel
Model tree
Base
unsloth/Qwen3.5-9B
Quantized
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information