What it catches
- Drupal idioms that actually break things: missing dependency injection where it matters, absent cache metadata, a procedural hook that should be
#[Hook], a forgotten static create().
- Logic defects: null dereferences, an inverted or dropped guard, off-by-one access checks, foreach-by-reference traps, an
array_merge on something that isn't an array.
- Security: XSS, SQL injection, access bypass, unsafe deserialization, IDOR.
One finding per issue, output kept to clean JSON, which is what a CI step or a git hook wants.
How v6 did
We mined 100 paired examples from Drupal contrib (a buggy pre-fix diff plus its matched merged fix) that none of v4/v5/v5.1/v6 saw in training. Same base, adapters hot-swapped in vLLM, scored pair-wise: a pair counts only when the model flags the buggy half and stays quiet on the clean one.
Table with columns: model, pair-correct, positive recall, negative specificity, category match| model | pair-correct | positive recall | negative specificity | category match |
|---|
| base (no adapter) | 0.30 | 0.38 | 0.90 | — |
| v4 | 0.53 | 0.55 | 0.96 | 0.69 |
| v5.1 | 0.67 | 0.72 | 0.93 | 0.65 |
| v6 | 0.73 | 0.78 | 0.94 | 0.80 |
v6 is ahead on every metric, and the line is monotonic — each version beat the one before it. Against v5.1: recall 0.72 → 0.78, pair-correct 0.67 → 0.73, specificity held, and category match jumped 0.65 → 0.80. The category gain is the headline; v6 is much better at naming which kind of defect it found, which used to be its weak spot.
Straight talk on significance. A paired sign-test on the disagreements (v5.1 vs v6) lands at 11 wins for v6, 5 for v5.1, two-sided p ≈ 0.21. So this is a real, consistent improvement, best on all four metrics, but not a statistically significant pair-level win on 100 pairs. v5.1-over-v4 cleared the bar (p ≈ 0.039); this gap is directionally clear but would need a bigger board to lock down. v6 ships because it is the best available adapter, not because the gap is proven.
How it was trained
v6 is a warm start from v5.1: we loaded the v5.1 adapter as trainable and continued training it on a larger clean set, so it keeps what v5.1 learned and grows from there.
Training data was 732 rows / 392 positives — v5.1's set plus 51 new teacher-verified bug-fix pairs mined across about 45 contrib projects. The teacher protocol stayed strict: a candidate became a positive only as a real, isolated, reproducible defect, never a style nit or a deprecation. No synthetic positives.
QLoRA at rank 16 on the attention projections (q/k/v/o), learning rate 1e-4, two epochs, gradient checkpointing. One A100-SXM4-80GB, 92 steps in 68 minutes.
Run it
Serve the base with the adapter hot-loaded (vLLM):
vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
--enable-lora --max-lora-rank 16 \
--lora-modules dcr-v6=bartek-flp/qwen3coder-30b-dcr-lora-v6 \
--max-model-len 8192 --port 8000
Then point your reviewer at --model dcr-v6. The full system (CLI, HTTP API, web UI, RAG over Drupal docs) lives in the Drupal CodeReviewer project.
Base and lineage
Base: Qwen/Qwen3-Coder-30B-A3B-Instruct (MoE, 3.3B active / 30B total, Apache 2.0). Warm-start chain: v4 → v5.1 → v6, each continuing the previous adapter. Earlier rounds on HF: …-lora-v51, …-lora-v4, -v3, -v2.
License
Adapter weights for research and evaluation. Commercial use of the surrounding Drupal CodeReviewer system needs a license: filipiuk.bartek@gmail.com.
Framework versions