Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Why
The base model under-reports issues on real Drupal merge requests (high precision, low recall). This adapter is trained on a hybrid distillation set so the model both catches real Drupal anti-patterns (synthetic positives) and stays quiet on clean code (real merged-MR negatives).
Training data
400 teacher-labeled pairs (distillation_v1): 251 positive / 149 negative.
- Positives — 243 synthetic across 26 Drupal anti-patterns (SQLi, XSS sinks,
CSRF-on-GET, broken DI / missing
create(),accessCheck()omissions, recursion inpresave, deprecated APIs, etc.) + 7 real merge-request bugs. - Negatives — 149 clean, merged MRs from webform, paragraphs, drupal core,
pathauto, commerce, search_api (teacher-verified clean).
Teacher: Claude Opus 4.x. Each pair is
(diff → JSON verdict + findings).
Usage (with the base model)
python
from transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModelbase = "Qwen/Qwen3-Coder-30B-A3B-Instruct"tok = AutoTokenizer.from_pretrained(base)m = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype="bfloat16")m = PeftModel.from_pretrained(m, "bartek-flp/qwen3coder-30b-dcr-lora")
Prompt with the DCR system message (review a diff, output JSON findings only).
Results (A/B vs base, held-out val)
48 held-out pairs the adapter never saw (27 with a defect, 21 clean), temperature 0, served as the same base weights with the LoRA hot-swapped, so only the training differs.
| Metric (n=48) | Base | Tuned |
|---|---|---|
| Verdict accuracy | 83.3% (40/48) | 95.8% (46/48) |
| Positive recall | 81.5% (22/27) | 92.6% (25/27) |
| Negative specificity | 85.7% (18/21) | 100% (21/21) |
| Category match | 40.7% (11/27) | 63.0% (17/27) |
| Invalid JSON | 4/48 | 0/48 |
Honest read: fine-tuning mostly bought reliability and calibration, not raw bug-finding. The base model already detects most issues, but on 4 positives it emitted unparseable JSON (often a stray \Drupal\ backslash) and on 3 clean diffs it raised false alarms. The adapter always returns valid JSON, holds 100% specificity, and names categories better. The cost: it missed two low-severity O(n²) array_merge-in-loop bugs the base model caught. A full report with verbatim side-by-side outputs, covering both the wins and the losses, ships in the project repo under docs/eval/.
Limitations
QLoRA on attention projections only; tuned for diff review, not general chat. The synthetic positives teach patterns, not every real-world manifestation. Always keep a human in the loop for security findings.
Model provider
bartek-flp
Model tree
Base
Qwen/Qwen3-Coder-30B-A3B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information