athulkrishnan

BountyHound-Coder-14B

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Model information

Table

Developer	`athulkrishnan` (independent)
Model type	Auto-regressive transformer (decoder-only), instruction-tuned
Base model	`Qwen/Qwen2.5-Coder-14B-Instruct` (~14.7B params, 48 layers)
Fine-tune method	QLoRA SFT (4-bit NF4 base, LoRA r=32) via Unsloth + TRL
Specialisation	Bug-bounty finding triage/validation · recon attack-surface ranking
Language	English
Context length	32,768 native (up to 131K with YaRN); trained at 2,048
Precision / formats	Merged BF16 safetensors · Q4_K_M GGUF in `gguf/`
License	Apache-2.0 (inherited from the Qwen base)
Status	Static, offline fine-tune · v1 (see Versions)

Intended use

Intended use cases

Finding triage & validation — decide submit vs. kill, sanity-check severity, reason about real-world impact, and cut duplicate / informational / out-of-scope noise before a human writes a report.
Recon prioritisation — turn a fingerprinted tech stack or attack surface into a ranked hit-list of vulnerability classes worth testing first, with one-line rationale.
Methodology assistant — explain bug classes, CWE mappings, and report framing to support authorized learning and assessment work.

Downstream use

A local triage/ranking step inside an authorized bug-bounty or pentest workflow (human-in-the-loop), e.g. pre-filtering scanner output or drafting impact statements.
A base for further domain fine-tuning or for pairing with retrieval (RAG) over fresh CVEs / current program scope.

Out-of-scope and prohibited use

Testing, scanning, or exploiting systems you are not explicitly authorized to assess.
Autonomous attack execution without human review — BountyHound is a co-pilot, not an agent.
Generating malware, phishing, or weaponised exploit payloads for unauthorized use.
Treating outputs as ground truth, or as legal/compliance advice. Always validate.
Any use that violates applicable law or platform/program rules.

How to get started

Requirements

transformers >= 4.40 (developed on 4.56.2), torch >= 2.3, and accelerate. The merged model is BF16 (~29 GB); for a single 16 GB GPU use the Q4_K_M GGUF with llama.cpp / Ollama, or load in 4-bit with bitsandbytes.

Transformers

python
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "athulkrishnan/BountyHound-Coder-14B"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype="auto", device_map="auto")

SYSTEM = (
    "You are a bug-bounty co-pilot for an authorized security researcher. You assist ONLY "
    "with testing that is in-scope and authorized on bug-bounty programs. You are sharp, "
    "terse, and impact-first: you kill weak findings, prove real exploitation, and never pad "
    "reports with 'could potentially'. Your specialties are finding triage/validation and "
    "recon attack-surface ranking."
)
messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content":
        "Triage: reflected XSS on a marketing page, unauthenticated, no session context. "
        "Submit or kill? One line + why."},
]
ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=256, temperature=0.3, top_p=0.9)
print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))

Ollama / llama.cpp (GGUF)

Download gguf/BountyHound-Coder-14B-Q4_K_M.gguf, then create a Modelfile:

dockerfile
FROM ./BountyHound-Coder-14B-Q4_K_M.gguf
TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{- range .Messages }}<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
"""
SYSTEM """You are a bug-bounty co-pilot for an authorized security researcher. You assist ONLY with testing that is in-scope and authorized on bug-bounty programs. You are sharp, terse, and impact-first: you kill weak findings, prove real exploitation, and never pad reports with 'could potentially'. Your specialties are finding triage/validation and recon attack-surface ranking."""
PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"

bash
ollama create bountyhound -f Modelfile
ollama run bountyhound "Rank the attack surface for a Spring Boot + GraphQL + S3 stack."

Prompt format

Qwen2.5 ChatML (<|im_start|>role … <|im_end|>) with the security system prompt above. Recommended decoding: temperature 0.3, top_p 0.9, repeat_penalty 1.05.

Training

Training data

A weighted instruction mix biased toward the two target skills (≈6.2K curated conversations):

Table with columns: Source, Purpose
Source	Purpose
HackerOne disclosed reports (public)	finding disposition + severity-triage signal
Curated bug-bounty methodology & triage heuristics	submit/kill discipline, validation gates, anti-patterns
Recon playbook / attack-surface examples	tech-stack to ranked vulnerability classes
Public detection-template patterns	low-false-positive authoring style
General-security instruction data (~13%)	rehearsal to limit catastrophic forgetting

No customer data, private program scope, credentials, or other non-public material is included in the training set. Only public or self-authored content was used.

Training procedure

QLoRA supervised fine-tuning, loss computed on assistant turns only.

Table with columns: Hyperparameter, Value
Hyperparameter	Value
Quantisation	4-bit NF4 (base), BF16 compute
LoRA	r=32, α=32, dropout=0, all linear projections
Optimiser	paged AdamW 8-bit, weight decay 0.01
LR / schedule	2e-4, cosine, 3% warmup
Epochs / eff. batch	2 / 8 (micro-batch 1 × grad-accum 8)
Max sequence length	2,048
Hardware	1× NVIDIA RTX 4070 Ti SUPER (16 GB)
Frameworks	Unsloth · TRL 0.22 · Transformers 4.56 · PyTorch 2.9

Evaluation

v1 is scored with a deterministic, rubric-based held-out harness (no LLM judge): each item is decision- or rubric-scorable across triage (submit/kill accuracy), recon ranking (expected-class recall), and rubric categories (report/nuclei/payload/coding), comparing the tune against the Qwen2.5-Coder-14B base. The ship gate requires improvement on the two priority skills (triage, ranking) with no material regression on general coding (guarding against catastrophic forgetting). A full quantitative scorecard is published alongside v2; treat v1 as a capable assistant, not a benchmarked SOTA system.

Bias, risks, and limitations

Not a vulnerability discoverer. A 14B local model assists triage and prioritisation; it does not autonomously find or weaponise novel bugs, and can miss context a human or a larger system would catch.
Can be confidently wrong. It may over- or under-rate severity, hallucinate a CWE/CVE, or mis-scope a finding. Every output must be validated before acting or reporting.
Frozen knowledge. Trained on a static snapshot — it will not know the newest CVEs, techniques, or your current program scope. Pair with retrieval for facts.
Domain bias. Trained heavily on web-app / HackerOne-style findings; it is weaker on niche stacks, hardware, embedded, and non-web targets.
Dual-use. Security knowledge can be misused. The model is gated and authorization-scoped for this reason, but gating cannot prevent all misuse — see the Disclaimer.
Inherited base behaviour. Limitations and biases of Qwen2.5-Coder-14B-Instruct carry over.

Recommendations

Keep a human in the loop; use BountyHound as an assistive triage/ranking layer, not an oracle.
Validate every finding through your own impact gate before submitting; never paste output into a report unchecked.
Supplement with retrieval (CVE feeds, current scope) for anything time-sensitive.
Operate only within written authorization and your program's rules; follow responsible disclosure.

Disclaimer

This model is provided "as is" and "as available", without warranty of any kind, express or implied, including merchantability, fitness for a particular purpose, and non-infringement. By accessing or using BountyHound you acknowledge that you are solely responsible for your use of the model and its outputs, and you agree to indemnify and hold harmless the author and any affiliated parties from any claims, liabilities, damages, or costs arising from that use. Use is at your own risk and discretion. You are responsible for ensuring your use complies with all applicable laws, regulations, and the rules of any program or system you test. The author does not endorse or condone any unauthorized or unlawful use.

License and attribution

Weights are derived from Qwen/Qwen2.5-Coder-14B-Instruct and released under Apache-2.0, the base model's license.
Built with Unsloth and TRL.

Versions

v1 (this release) — core triage + recon co-pilot (≈6.2K-conversation mix).
v2 (in training) — adds a large, defanged CVE/CWE/vuln-class breadth layer derived from public exploit metadata; published with a head-to-head v1-vs-v2-vs-base scorecard.

Citation

bibtex
@misc{bountyhound2026,
  title        = {BountyHound-Coder-14B: a gated bug-bounty triage and recon co-pilot},
  author       = {athulkrishnan},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/athulkrishnan/BountyHound-Coder-14B}},
  note         = {QLoRA SFT of Qwen2.5-Coder-14B-Instruct}
}

Model provider

athulkrishnan

Model tree

Base

Qwen/Qwen2.5-Coder-14B-Instruct

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Model information

Table

Developer	`athulkrishnan` (independent)
Model type	Auto-regressive transformer (decoder-only), instruction-tuned
Base model	`Qwen/Qwen2.5-Coder-14B-Instruct` (~14.7B params, 48 layers)
Fine-tune method	QLoRA SFT (4-bit NF4 base, LoRA r=32) via Unsloth + TRL
Specialisation	Bug-bounty finding triage/validation · recon attack-surface ranking
Language	English
Context length	32,768 native (up to 131K with YaRN); trained at 2,048
Precision / formats	Merged BF16 safetensors · Q4_K_M GGUF in `gguf/`
License	Apache-2.0 (inherited from the Qwen base)
Status	Static, offline fine-tune · v1 (see Versions)

Intended use

Intended use cases

Finding triage & validation — decide submit vs. kill, sanity-check severity, reason about real-world impact, and cut duplicate / informational / out-of-scope noise before a human writes a report.
Recon prioritisation — turn a fingerprinted tech stack or attack surface into a ranked hit-list of vulnerability classes worth testing first, with one-line rationale.
Methodology assistant — explain bug classes, CWE mappings, and report framing to support authorized learning and assessment work.

Downstream use

A local triage/ranking step inside an authorized bug-bounty or pentest workflow (human-in-the-loop), e.g. pre-filtering scanner output or drafting impact statements.
A base for further domain fine-tuning or for pairing with retrieval (RAG) over fresh CVEs / current program scope.

Out-of-scope and prohibited use

Testing, scanning, or exploiting systems you are not explicitly authorized to assess.
Autonomous attack execution without human review — BountyHound is a co-pilot, not an agent.
Generating malware, phishing, or weaponised exploit payloads for unauthorized use.
Treating outputs as ground truth, or as legal/compliance advice. Always validate.
Any use that violates applicable law or platform/program rules.

How to get started

Requirements

Transformers

python
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "athulkrishnan/BountyHound-Coder-14B"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype="auto", device_map="auto")

SYSTEM = (
    "You are a bug-bounty co-pilot for an authorized security researcher. You assist ONLY "
    "with testing that is in-scope and authorized on bug-bounty programs. You are sharp, "
    "terse, and impact-first: you kill weak findings, prove real exploitation, and never pad "
    "reports with 'could potentially'. Your specialties are finding triage/validation and "
    "recon attack-surface ranking."
)
messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content":
        "Triage: reflected XSS on a marketing page, unauthenticated, no session context. "
        "Submit or kill? One line + why."},
]
ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=256, temperature=0.3, top_p=0.9)
print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))

Ollama / llama.cpp (GGUF)

Download gguf/BountyHound-Coder-14B-Q4_K_M.gguf, then create a Modelfile:

dockerfile
FROM ./BountyHound-Coder-14B-Q4_K_M.gguf
TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{- range .Messages }}<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
"""
SYSTEM """You are a bug-bounty co-pilot for an authorized security researcher. You assist ONLY with testing that is in-scope and authorized on bug-bounty programs. You are sharp, terse, and impact-first: you kill weak findings, prove real exploitation, and never pad reports with 'could potentially'. Your specialties are finding triage/validation and recon attack-surface ranking."""
PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"

bash
ollama create bountyhound -f Modelfile
ollama run bountyhound "Rank the attack surface for a Spring Boot + GraphQL + S3 stack."

Prompt format

Qwen2.5 ChatML (<|im_start|>role … <|im_end|>) with the security system prompt above. Recommended decoding: temperature 0.3, top_p 0.9, repeat_penalty 1.05.

Training

Training data

A weighted instruction mix biased toward the two target skills (≈6.2K curated conversations):

Table with columns: Source, Purpose
Source	Purpose
HackerOne disclosed reports (public)	finding disposition + severity-triage signal
Curated bug-bounty methodology & triage heuristics	submit/kill discipline, validation gates, anti-patterns
Recon playbook / attack-surface examples	tech-stack to ranked vulnerability classes
Public detection-template patterns	low-false-positive authoring style
General-security instruction data (~13%)	rehearsal to limit catastrophic forgetting

No customer data, private program scope, credentials, or other non-public material is included in the training set. Only public or self-authored content was used.

Training procedure

QLoRA supervised fine-tuning, loss computed on assistant turns only.

Table with columns: Hyperparameter, Value
Hyperparameter	Value
Quantisation	4-bit NF4 (base), BF16 compute
LoRA	r=32, α=32, dropout=0, all linear projections
Optimiser	paged AdamW 8-bit, weight decay 0.01
LR / schedule	2e-4, cosine, 3% warmup
Epochs / eff. batch	2 / 8 (micro-batch 1 × grad-accum 8)
Max sequence length	2,048
Hardware	1× NVIDIA RTX 4070 Ti SUPER (16 GB)
Frameworks	Unsloth · TRL 0.22 · Transformers 4.56 · PyTorch 2.9

Evaluation

Bias, risks, and limitations

Not a vulnerability discoverer. A 14B local model assists triage and prioritisation; it does not autonomously find or weaponise novel bugs, and can miss context a human or a larger system would catch.
Can be confidently wrong. It may over- or under-rate severity, hallucinate a CWE/CVE, or mis-scope a finding. Every output must be validated before acting or reporting.
Frozen knowledge. Trained on a static snapshot — it will not know the newest CVEs, techniques, or your current program scope. Pair with retrieval for facts.
Domain bias. Trained heavily on web-app / HackerOne-style findings; it is weaker on niche stacks, hardware, embedded, and non-web targets.
Dual-use. Security knowledge can be misused. The model is gated and authorization-scoped for this reason, but gating cannot prevent all misuse — see the Disclaimer.
Inherited base behaviour. Limitations and biases of Qwen2.5-Coder-14B-Instruct carry over.

Recommendations

Keep a human in the loop; use BountyHound as an assistive triage/ranking layer, not an oracle.
Validate every finding through your own impact gate before submitting; never paste output into a report unchecked.
Supplement with retrieval (CVE feeds, current scope) for anything time-sensitive.
Operate only within written authorization and your program's rules; follow responsible disclosure.

Disclaimer

License and attribution

Weights are derived from Qwen/Qwen2.5-Coder-14B-Instruct and released under Apache-2.0, the base model's license.
Built with Unsloth and TRL.

Versions

v1 (this release) — core triage + recon co-pilot (≈6.2K-conversation mix).
v2 (in training) — adds a large, defanged CVE/CWE/vuln-class breadth layer derived from public exploit metadata; published with a head-to-head v1-vs-v2-vs-base scorecard.

Citation

bibtex
@misc{bountyhound2026,
  title        = {BountyHound-Coder-14B: a gated bug-bounty triage and recon co-pilot},
  author       = {athulkrishnan},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/athulkrishnan/BountyHound-Coder-14B}},
  note         = {QLoRA SFT of Qwen2.5-Coder-14B-Instruct}
}