oyildirim

CyberStrike-OffSec-35B

README

License: apache-2.0

What is CyberStrike?

CyberStrike-OffSec-35B is a domain-specialized large language model built for offensive security professionals, penetration testers, and security researchers. Fine-tuned on Qwen3.6-35B-A3B using a two-stage pipeline (SFT + DPO), it delivers expert-level knowledge across the entire offensive security lifecycle:

Vulnerability Discovery — SQL injection, XSS, SSRF, deserialization, business logic flaws
MITRE ATT&CK Operations — Technique identification, kill chain analysis, threat mapping
Exploit Development — PoC creation, payload crafting, evasion techniques
Cloud & Infrastructure — AWS/Azure/GCP misconfigurations, container escapes, IAM abuse
Red Team Operations — C2 setup, lateral movement, persistence, EDR evasion
Compliance & Standards — NIST, OWASP ASVS, CIS benchmarks, CVSS scoring

Model Format: This is the full-precision BF16 model (67 GB, 26 safetensors shards). For quantized versions, see below.

Available Versions

Table with columns: Repo, Format, Size, Use Case
Repo	Format	Size	Use Case
oyildirim/CyberStrike-OffSec-35B	BF16 (full precision)	67 GB	Transformers, vLLM, fine-tuning
oyildirim/CyberStrike-OffSec-35B-GGUF	GGUF Q8_0	36 GB	llama.cpp, Ollama, LM Studio
oyildirim/CyberStrike-OffSec-35B-GGUF	GGUF Q6_K	27 GB	llama.cpp, Ollama, LM Studio
oyildirim/CyberStrike-OffSec-35B-GGUF	GGUF Q5_K_M

Benchmark Results

CyberStrike achieves state-of-the-art results on multiple cybersecurity benchmarks, outperforming GPT-4-turbo, GPT-4, and all other evaluated models on domain-specific evaluations.

SecEval — #1 on Leaderboard

Outperforms GPT-4-turbo by +2.32 points across 9 cybersecurity domains, 2,189 questions.

Table with columns: Rank, Model, Overall, Network Sec, Web Sec, PenTest, Cryptography
Rank	Model	Overall	Network Sec	Web Sec	PenTest	Cryptography
#1	CyberStrike-OffSec-35B	81.39%	85.09%	85.34%	82.26%	75.00%
#2	GPT-4-turbo	79.07%	75.65%	82.15%

Table with columns: Domain, CyberStrike, GPT-4-turbo, Delta
Domain	CyberStrike	GPT-4-turbo	Delta
Network Security	85.09%	75.65%	+9.44
Web Security	85.34%	82.15%	+3.19
Vulnerability	83.33%	76.05%	+7.28
Application Security	82.29%	75.25%	+7.04

CyberStrike leads in all 9 domains. Largest improvement: Cryptography (+10.71) and Network Security (+9.44).

SECURE — #1 on MITRE ATT&CK & CWE Tasks

Outperforms GPT-4 by +5.34 points on MITRE ATT&CK extraction. Evaluated on ICS cybersecurity scenarios.

Table with columns: Task, CyberStrike, GPT-4, Llama3-70B, Gemini-Pro
Task	CyberStrike	GPT-4	Llama3-70B	Gemini-Pro
MAET (MITRE ATT&CK)	93.94%	88.6%	86.3%	86.2%
CWET (CWE Knowledge)	93.05%	89.6%	90.4%	87.8%

CyberMetric-10000 — #6 out of 25 Models

9,189 expert-validated cybersecurity MCQ questions across NIST, RFC, and industry standards.

Table with columns: Rank, Model, Score
Rank	Model	Score
#1	GPT-4o	88.89%
#2	GPT-4-turbo	88.50%
#3	GEMINI-pro 1.0	87.50%
#4	Mixtral-8x7B-Instruct	87.00%
#5	Falcon-180B-Chat	87.00%
#6	CyberStrike-OffSec-35B

Table with columns: Benchmark, Score
Benchmark	Score
MMLU (overall)	76.94%
MMLU — Social Sciences	86.81%
MMLU — Computer Security	86.00%
MMLU — Other	81.43%
MMLU — Security Studies	80.00%
MMLU — STEM	73.87%
MMLU — Humanities	69.59%
HellaSwag (acc_norm)	79.61%
ARC Easy	81.86%

Note: General benchmarks run at 0-shot. Few-shot performance expected to be higher.

Quick Start

Ollama (Easiest)

bash
# Download and run the Q4_K_M quantized version
ollama run hf.co/oyildirim/CyberStrike-OffSec-35B-GGUF:Q4_K_M

llama.cpp

bash
# Download the GGUF file from https://huggingface.co/oyildirim/CyberStrike-OffSec-35B-GGUF
./llama-cli -m CyberStrike-OffSec-35B-Q4_K_M.gguf \
  -p "Explain SSRF exploitation in cloud environments" \
  -n 512 --temp 0.7

Transformers

python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "oyildirim/CyberStrike-OffSec-35B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "oyildirim/CyberStrike-OffSec-35B",
    trust_remote_code=True,
)

messages = [
    {"role": "user", "content": "Explain SSRF exploitation in cloud environments with AWS metadata service abuse."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

vLLM (Recommended for Production)

bash
pip install vllm

vllm serve oyildirim/CyberStrike-OffSec-35B \
  --dtype bfloat16 \
  --max-model-len 4096 \
  --trust-remote-code \
  --served-model-name CyberStrike-OffSec-35B

Then use the OpenAI-compatible API:

python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
    model="CyberStrike-OffSec-35B",
    messages=[{"role": "user", "content": "How to exploit deserialization vulnerabilities in Java applications?"}],
    max_tokens=2048,
)
print(response.choices[0].message.content)

Model Details

Table with columns: Property, Value
Property	Value
Base Model	Qwen3.6-35B-A3B
Type	Mixture-of-Experts (MoE)
Total Parameters	35 Billion
Active Parameters	~3 Billion per token
Precision	BF16 (Brain Float 16)
Model Size	67 GB (26 safetensors shards)
Context Length	8,192 tokens (training) / 262,144 max (architecture)

Training Pipeline

CyberStrike was trained using a two-stage alignment pipeline:

Stage 1: Supervised Fine-Tuning (SFT)

The base Qwen3.6-35B-A3B model was fine-tuned on a curated dataset of offensive security scenarios covering 10 categories:

web_app cloud post_exploitation edr_evasion malware_dev network social_engineering full_kill_chain lateral_movement persistence

Method: QLoRA (4-bit NF4 quantization)
LoRA Config: r=64, alpha=128, dropout=0
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Stage 2: Direct Preference Optimization (DPO)

The SFT model was further aligned using 115,250 preference pairs across 12 carefully designed axes, teaching the model to produce expert-level responses over superficial ones:

Table with columns: Axis, Description, Examples
Axis	Description	Examples
MITRE ATT&CK Depth	Deep technique analysis over surface-level summaries	T1059 sub-technique breakdowns
CVE Analysis	Detailed vulnerability analysis with CVSS scoring	CVE-2024-* exploit chains
OWASP Methodology	Structured testing methodology	ASVS compliance checks
Cloud Security	Provider-specific attack paths	AWS IAM, Azure AD, GCP abuse
Tool Usage	Proper tool invocation patterns	Nmap, Burp, sqlmap workflows

Method: QLoRA, LoRA r=32, alpha=64
DPO Beta: 0.1
Learning Rate: 5e-6 with cosine schedule
Effective Batch Size: 8
Training Steps: 9,142

Architecture

markdown
Qwen3.6-35B-A3B (Mixture-of-Experts)
├── 35B total parameters
├── ~3B active parameters per token
├── 256 experts, top-8 routing + 1 shared expert
├── Grouped Query Attention (GQA)
├── RoPE positional encoding (theta=10M)
├── Max position embeddings: 262,144
└── BF16 precision (67 GB on disk)

The MoE architecture provides a unique advantage: expert-level knowledge at inference costs comparable to a 3B model, while having the knowledge capacity of a 35B model.

Use Cases

CyberStrike is designed for professionals conducting authorized security assessments:

Penetration Testing — Web app, network, cloud, and API security testing
Red Team Operations — Full kill chain simulation, C2 operations, evasion
Vulnerability Research — CVE analysis, exploit development, PoC creation
CTF Competitions — Challenge solving, reverse engineering, cryptography
Security Education — Training material generation, exam preparation
Threat Intelligence — MITRE ATT&CK mapping, threat actor TTPs
Compliance Assessment — NIST, OWASP, CIS benchmark evaluation

Ethical Use & Disclaimer

This model is intended exclusively for authorized security testing, education, and research purposes. Users must:

Obtain proper written authorization before testing any systems
Comply with all applicable laws and regulations
Follow responsible disclosure practices
Never use this model for unauthorized access or malicious activities

The authors are not responsible for any misuse of this model.

Citation

bibtex
@misc{cyberstrike2025,
  title={CyberStrike-OffSec-35B: A Domain-Specialized LLM for Offensive Security},
  author={Orhan Yildirim},
  year={2025},
  url={https://huggingface.co/oyildirim/CyberStrike-OffSec-35B}
}

Built with purpose. Benchmarked with rigor. Designed for professionals.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

oyildirim

Model Tree

Base

Qwen/Qwen3.6-35B-A3B

Fine-tuned

this model

Input Modalities

Text

Image

Video

Output Modalities

Text

Supported Functionality

Dedicated Endpoints

Explore FriendliAI today

Get started Talk to an engineer

README

License: apache-2.0

What is CyberStrike?

Vulnerability Discovery — SQL injection, XSS, SSRF, deserialization, business logic flaws
MITRE ATT&CK Operations — Technique identification, kill chain analysis, threat mapping
Exploit Development — PoC creation, payload crafting, evasion techniques
Cloud & Infrastructure — AWS/Azure/GCP misconfigurations, container escapes, IAM abuse
Red Team Operations — C2 setup, lateral movement, persistence, EDR evasion
Compliance & Standards — NIST, OWASP ASVS, CIS benchmarks, CVSS scoring

Model Format: This is the full-precision BF16 model (67 GB, 26 safetensors shards). For quantized versions, see below.

Available Versions

Table with columns: Repo, Format, Size, Use Case
Repo	Format	Size	Use Case
oyildirim/CyberStrike-OffSec-35B	BF16 (full precision)	67 GB	Transformers, vLLM, fine-tuning
oyildirim/CyberStrike-OffSec-35B-GGUF	GGUF Q8_0	36 GB	llama.cpp, Ollama, LM Studio
oyildirim/CyberStrike-OffSec-35B-GGUF	GGUF Q6_K	27 GB	llama.cpp, Ollama, LM Studio
oyildirim/CyberStrike-OffSec-35B-GGUF	GGUF Q5_K_M

Benchmark Results

CyberStrike achieves state-of-the-art results on multiple cybersecurity benchmarks, outperforming GPT-4-turbo, GPT-4, and all other evaluated models on domain-specific evaluations.

SecEval — #1 on Leaderboard

Outperforms GPT-4-turbo by +2.32 points across 9 cybersecurity domains, 2,189 questions.

Table with columns: Rank, Model, Overall, Network Sec, Web Sec, PenTest, Cryptography
Rank	Model	Overall	Network Sec	Web Sec	PenTest	Cryptography
#1	CyberStrike-OffSec-35B	81.39%	85.09%	85.34%	82.26%	75.00%
#2	GPT-4-turbo	79.07%	75.65%	82.15%

Table with columns: Domain, CyberStrike, GPT-4-turbo, Delta
Domain	CyberStrike	GPT-4-turbo	Delta
Network Security	85.09%	75.65%	+9.44
Web Security	85.34%	82.15%	+3.19
Vulnerability	83.33%	76.05%	+7.28
Application Security	82.29%	75.25%	+7.04

CyberStrike leads in all 9 domains. Largest improvement: Cryptography (+10.71) and Network Security (+9.44).

SECURE — #1 on MITRE ATT&CK & CWE Tasks

Outperforms GPT-4 by +5.34 points on MITRE ATT&CK extraction. Evaluated on ICS cybersecurity scenarios.

Table with columns: Task, CyberStrike, GPT-4, Llama3-70B, Gemini-Pro
Task	CyberStrike	GPT-4	Llama3-70B	Gemini-Pro
MAET (MITRE ATT&CK)	93.94%	88.6%	86.3%	86.2%
CWET (CWE Knowledge)	93.05%	89.6%	90.4%	87.8%

CyberMetric-10000 — #6 out of 25 Models

9,189 expert-validated cybersecurity MCQ questions across NIST, RFC, and industry standards.

Table with columns: Rank, Model, Score
Rank	Model	Score
#1	GPT-4o	88.89%
#2	GPT-4-turbo	88.50%
#3	GEMINI-pro 1.0	87.50%
#4	Mixtral-8x7B-Instruct	87.00%
#5	Falcon-180B-Chat	87.00%
#6	CyberStrike-OffSec-35B

Table with columns: Benchmark, Score
Benchmark	Score
MMLU (overall)	76.94%
MMLU — Social Sciences	86.81%
MMLU — Computer Security	86.00%
MMLU — Other	81.43%
MMLU — Security Studies	80.00%
MMLU — STEM	73.87%
MMLU — Humanities	69.59%
HellaSwag (acc_norm)	79.61%
ARC Easy	81.86%

Note: General benchmarks run at 0-shot. Few-shot performance expected to be higher.

Quick Start

Ollama (Easiest)

bash
# Download and run the Q4_K_M quantized version
ollama run hf.co/oyildirim/CyberStrike-OffSec-35B-GGUF:Q4_K_M

llama.cpp

bash
# Download the GGUF file from https://huggingface.co/oyildirim/CyberStrike-OffSec-35B-GGUF
./llama-cli -m CyberStrike-OffSec-35B-Q4_K_M.gguf \
  -p "Explain SSRF exploitation in cloud environments" \
  -n 512 --temp 0.7

Transformers

python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "oyildirim/CyberStrike-OffSec-35B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "oyildirim/CyberStrike-OffSec-35B",
    trust_remote_code=True,
)

messages = [
    {"role": "user", "content": "Explain SSRF exploitation in cloud environments with AWS metadata service abuse."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

vLLM (Recommended for Production)

bash
pip install vllm

vllm serve oyildirim/CyberStrike-OffSec-35B \
  --dtype bfloat16 \
  --max-model-len 4096 \
  --trust-remote-code \
  --served-model-name CyberStrike-OffSec-35B

Then use the OpenAI-compatible API:

python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
    model="CyberStrike-OffSec-35B",
    messages=[{"role": "user", "content": "How to exploit deserialization vulnerabilities in Java applications?"}],
    max_tokens=2048,
)
print(response.choices[0].message.content)

Model Details

Table with columns: Property, Value
Property	Value
Base Model	Qwen3.6-35B-A3B
Type	Mixture-of-Experts (MoE)
Total Parameters	35 Billion
Active Parameters	~3 Billion per token
Precision	BF16 (Brain Float 16)
Model Size	67 GB (26 safetensors shards)
Context Length	8,192 tokens (training) / 262,144 max (architecture)

Training Pipeline

CyberStrike was trained using a two-stage alignment pipeline:

Stage 1: Supervised Fine-Tuning (SFT)

The base Qwen3.6-35B-A3B model was fine-tuned on a curated dataset of offensive security scenarios covering 10 categories:

web_app cloud post_exploitation edr_evasion malware_dev network social_engineering full_kill_chain lateral_movement persistence

Method: QLoRA (4-bit NF4 quantization)
LoRA Config: r=64, alpha=128, dropout=0
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Stage 2: Direct Preference Optimization (DPO)

The SFT model was further aligned using 115,250 preference pairs across 12 carefully designed axes, teaching the model to produce expert-level responses over superficial ones:

Table with columns: Axis, Description, Examples
Axis	Description	Examples
MITRE ATT&CK Depth	Deep technique analysis over surface-level summaries	T1059 sub-technique breakdowns
CVE Analysis	Detailed vulnerability analysis with CVSS scoring	CVE-2024-* exploit chains
OWASP Methodology	Structured testing methodology	ASVS compliance checks
Cloud Security	Provider-specific attack paths	AWS IAM, Azure AD, GCP abuse
Tool Usage	Proper tool invocation patterns	Nmap, Burp, sqlmap workflows

Method: QLoRA, LoRA r=32, alpha=64
DPO Beta: 0.1
Learning Rate: 5e-6 with cosine schedule
Effective Batch Size: 8
Training Steps: 9,142

Architecture

markdown
Qwen3.6-35B-A3B (Mixture-of-Experts)
├── 35B total parameters
├── ~3B active parameters per token
├── 256 experts, top-8 routing + 1 shared expert
├── Grouped Query Attention (GQA)
├── RoPE positional encoding (theta=10M)
├── Max position embeddings: 262,144
└── BF16 precision (67 GB on disk)

The MoE architecture provides a unique advantage: expert-level knowledge at inference costs comparable to a 3B model, while having the knowledge capacity of a 35B model.

Use Cases

CyberStrike is designed for professionals conducting authorized security assessments:

Penetration Testing — Web app, network, cloud, and API security testing
Red Team Operations — Full kill chain simulation, C2 operations, evasion
Vulnerability Research — CVE analysis, exploit development, PoC creation
CTF Competitions — Challenge solving, reverse engineering, cryptography
Security Education — Training material generation, exam preparation
Threat Intelligence — MITRE ATT&CK mapping, threat actor TTPs
Compliance Assessment — NIST, OWASP, CIS benchmark evaluation

Ethical Use & Disclaimer

This model is intended exclusively for authorized security testing, education, and research purposes. Users must:

Obtain proper written authorization before testing any systems
Comply with all applicable laws and regulations
Follow responsible disclosure practices
Never use this model for unauthorized access or malicious activities

The authors are not responsible for any misuse of this model.

Citation

bibtex
@misc{cyberstrike2025,
  title={CyberStrike-OffSec-35B: A Domain-Specialized LLM for Offensive Security},
  author={Orhan Yildirim},
  year={2025},
  url={https://huggingface.co/oyildirim/CyberStrike-OffSec-35B}
}

Built with purpose. Benchmarked with rigor. Designed for professionals.