Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Results (this session, base + v3 + v4 scored together, temperature 0)

Security set (n=32, 16 pos / 16 neg)

MetricBasev3v4
Verdict accuracy71.9% (23/32)84.4% (27/32)90.6% (29/32)
Positive recall87.5% (14/16)75.0% (12/16)81.2% (13/16)
Negative specificity56.2% (9/16)93.8% (15/16)100% (16/16)
Category match56.2%50.0%56.2%
Invalid JSON0/320/320/32

Non-security set (n=26, 13 pos / 13 neg)

MetricBasev3v4
Verdict accuracy65.4% (17/26)61.5% (16/26)76.9% (20/26)
Positive recall69.2% (9/13)30.8% (4/13)61.5% (8/13)
Negative specificity61.5% (8/13)92.3% (12/13)92.3% (12/13)
Category match53.8%23.1%46.2%
Invalid JSON0/260/260/26

The non-security recall jump (v3 4/13 → v4 8/13) and the non-security verdict gain (16/26 → 20/26) are four-case moves, beyond the run-to-run noise. The security specificity and verdict lead are smaller margins, but nothing regressed and they point the same way.

Training data

v3's 526 rows + 41 real bug-fix positive/negative rows (mined from merged Drupal MRs, inverted, teacher-labeled). QLoRA r=16 on q/k/v/o, batch 4 + grad-accum 4 + grad-ckpt, MAX_LEN=2048, 3 epochs, lr 2e-4. Trained on one H100, 114 steps, ~106 min.

Limitations

Real-defect recall is still ~60% on non-security and ~80% on security — roughly two in five non-security bugs slip through. Category match is mediocre (46–56%): the model is better at "something is wrong" than at naming the kind. Raw recall is higher on the untuned base, but base flags nearly half of all clean code (specificity 56–62%), which is why v4 is the better tool despite trading a little recall for usable specificity. Keep a human in the loop; this adapter is one component of a hybrid pipeline (static analyzers + RAG + the model).

Model provider

bartek-flp

Model tree

Base

Qwen/Qwen3-Coder-30B-A3B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today