Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Results: base vs v1 vs v2 (real-defect eval, n=32)

16 real CVE-grade defects (advisory fix commits, inverted so the diff reintroduces the vuln; objective ground truth) + 16 matched clean fixes. Same base weights, LoRA hot-swapped, temperature 0.

MetricBasev1v2
Verdict accuracy71.9%59.4%78.1%
Positive recall (caught the real defect)87.5% (14/16)18.8% (3/16)56.2% (9/16)
Negative specificity (quiet on clean)56.2%100%100%
Category match56.2%43.8%
Invalid JSON0/320/320/32

Honest read: v2 roughly tripled v1's real-defect recall without giving back specificity, and has the best overall verdict accuracy. It is not strictly better than base — base still out-recalls it (14/16 vs 9/16) on subtle logic bypasses, and v2's category labelling regressed. But base false-alarms on 7 of 16 clean fixes (specificity 56%), where v1 and v2 raise zero. Pick v2 for a low-false-positive pipeline; pick base if you want maximum recall and will triage the noise. Full report with verbatim side-by-side outputs (wins and losses) ships in the project repo under docs/eval/.

Training data

v1's 400 pairs + 38 real security positives (inverted SA-CORE fix commits, objective category/severity from the advisory) + matched clean negatives + 11 low-severity contrastive pairs (e.g. O(n²) array_merge-in-loop with a near-miss clean form). 498 train rows; the real-defect eval set was held out by advisory ID. Teacher for the synthetic half: Claude Opus 4.x.

Usage (with the base model)

python

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = "Qwen/Qwen3-Coder-30B-A3B-Instruct"
tok = AutoTokenizer.from_pretrained(base)
m = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype="bfloat16")
m = PeftModel.from_pretrained(m, "bartek-flp/qwen3coder-30b-dcr-lora-v2")

Prompt with the DCR system message (review a diff, output JSON findings only).

Limitations

QLoRA on attention projections only (q/k/v/o, r=16). Real-defect recall is 56%, with the remaining gap mostly subtle logic-level access bypasses that the base model catches but v2 does not. Category labelling is weaker than base. The eval is small (n=32) and security-skewed. Always keep a human in the loop for security findings.

Model provider

bartek-flp

Model tree

Base

Qwen/Qwen3-Coder-30B-A3B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today