bartek-flp

qwen3coder-30b-dcr-lora

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Why

The base model under-reports issues on real Drupal merge requests (high precision, low recall). This adapter is trained on a hybrid distillation set so the model both catches real Drupal anti-patterns (synthetic positives) and stays quiet on clean code (real merged-MR negatives).

Training data

400 teacher-labeled pairs (distillation_v1): 251 positive / 149 negative.

Positives — 243 synthetic across 26 Drupal anti-patterns (SQLi, XSS sinks, CSRF-on-GET, broken DI / missing create(), accessCheck() omissions, recursion in presave, deprecated APIs, etc.) + 7 real merge-request bugs.
Negatives — 149 clean, merged MRs from webform, paragraphs, drupal core, pathauto, commerce, search_api (teacher-verified clean). Teacher: Claude Opus 4.x. Each pair is (diff → JSON verdict + findings).

Usage (with the base model)

python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = "Qwen/Qwen3-Coder-30B-A3B-Instruct"
tok = AutoTokenizer.from_pretrained(base)
m = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype="bfloat16")
m = PeftModel.from_pretrained(m, "bartek-flp/qwen3coder-30b-dcr-lora")

Prompt with the DCR system message (review a diff, output JSON findings only).

Results (A/B vs base, held-out val)

48 held-out pairs the adapter never saw (27 with a defect, 21 clean), temperature 0, served as the same base weights with the LoRA hot-swapped, so only the training differs.

Table with columns: Metric (n=48), Base, Tuned
Metric (n=48)	Base	Tuned
Verdict accuracy	83.3% (40/48)	95.8% (46/48)
Positive recall	81.5% (22/27)	92.6% (25/27)
Negative specificity	85.7% (18/21)	100% (21/21)
Category match	40.7% (11/27)	63.0% (17/27)
Invalid JSON	4/48	0/48

Honest read: fine-tuning mostly bought reliability and calibration, not raw bug-finding. The base model already detects most issues, but on 4 positives it emitted unparseable JSON (often a stray \Drupal\ backslash) and on 3 clean diffs it raised false alarms. The adapter always returns valid JSON, holds 100% specificity, and names categories better. The cost: it missed two low-severity O(n²) array_merge-in-loop bugs the base model caught. A full report with verbatim side-by-side outputs, covering both the wins and the losses, ships in the project repo under docs/eval/.

Limitations

QLoRA on attention projections only; tuned for diff review, not general chat. The synthetic positives teach patterns, not every real-world manifestation. Always keep a human in the loop for security findings.

Model provider

bartek-flp

Model tree

Base

Qwen/Qwen3-Coder-30B-A3B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Why

Training data

400 teacher-labeled pairs (distillation_v1): 251 positive / 149 negative.

Positives — 243 synthetic across 26 Drupal anti-patterns (SQLi, XSS sinks, CSRF-on-GET, broken DI / missing create(), accessCheck() omissions, recursion in presave, deprecated APIs, etc.) + 7 real merge-request bugs.
Negatives — 149 clean, merged MRs from webform, paragraphs, drupal core, pathauto, commerce, search_api (teacher-verified clean). Teacher: Claude Opus 4.x. Each pair is (diff → JSON verdict + findings).

Usage (with the base model)

python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = "Qwen/Qwen3-Coder-30B-A3B-Instruct"
tok = AutoTokenizer.from_pretrained(base)
m = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype="bfloat16")
m = PeftModel.from_pretrained(m, "bartek-flp/qwen3coder-30b-dcr-lora")

Prompt with the DCR system message (review a diff, output JSON findings only).

Results (A/B vs base, held-out val)

48 held-out pairs the adapter never saw (27 with a defect, 21 clean), temperature 0, served as the same base weights with the LoRA hot-swapped, so only the training differs.

Table with columns: Metric (n=48), Base, Tuned
Metric (n=48)	Base	Tuned
Verdict accuracy	83.3% (40/48)	95.8% (46/48)
Positive recall	81.5% (22/27)	92.6% (25/27)
Negative specificity	85.7% (18/21)	100% (21/21)
Category match	40.7% (11/27)	63.0% (17/27)
Invalid JSON	4/48	0/48

qwen3coder-30b-dcr-lora

Get help setting up a custom Dedicated Endpoints.

README

Why

Training data

Usage (with the base model)

Results (A/B vs base, held-out val)

Limitations

Explore FriendliAI today

README

Why

Training data

Usage (with the base model)

Results (A/B vs base, held-out val)

Limitations