April 23, 2026
2 min read

Vulnerability Discovery with Open-Weight GLM-5: Frontier Quality at 1/7 the Cost of Closed Models

Q: What is FriendliAI?

FriendliAI is the Frontier Inference Cloud for Agents, delivering high throughput, low latency, and reliability at scale for agentic workloads. Through vertically optimized inference infrastructure, it delivers 2–5× faster output token speed and a 99.99% uptime SLA for high-volume production traffic.

TL;DR

Traditional fuzzing struggles to discover deep vulnerabilities because it cannot generate structured inputs.
By combining fuzzing with LLMs, researchers achieved significantly better results, with GLM-5 reaching near-proprietary performance.
Running on FriendliAI, these results come at roughly one-seventh the cost, making advanced vulnerability discovery more accessible.

Vulnerability Discovery with Open-Weight GLM-5: Frontier Quality at 1/7 the Cost of Closed Models thumbnail

Finding security flaws before attackers strike is critical and difficult. Traditional fuzzers generate huge volumes of random inputs to test program behavior, but this brute-force approach surfaces only shallow bugs and misses the deeper ones. Real vulnerabilities go undetected. Finding such exploits requires structurally valid inputs, precise XML structures, or specific path patterns that are impossible to create with random mutation alone. This is where large language models can help.

In this post, we show how LLM-guided fuzzing dramatically improves vulnerability discovery. Through collaborations with leading security research groups, we demonstrate that open-weight models like GLM-5 running on FriendliAI can match closed-model results at roughly 1/7 of the cost.

Where fuzzing falls short and how LLMs help

Coverage-guided fuzzers, like Jazzer, a widely used fuzzing tool, mutate inputs randomly and track which branches they hit. Team Atlanta, the 2025 DARPA AI Cyber Challenge (AIxCC) winner, evaluated this approach on a 54-vulnerability benchmark. Even when scaling compute from a small setup to a much larger one, both runs found exactly 8 bugs. Within minutes, additional compute stopped producing new results.

The limitation was not compute but input structure, as many vulnerabilities require structured inputs that random mutation cannot produce.

To overcome this, the research team built Gondar, which combines Jazzer with large language models. Instead of relying only on random mutation, it uses LLMs to analyse code and generate well‑structured test inputs. This combination enables the system to reach deeper execution paths and uncover bugs that fuzzing alone would miss.

GLM-5: More Bugs Found at a Fraction of the Cost

Adding LLM guidance significantly improves coverage. Gondar with the Gemini‑2.5‑Pro model found 41 vulnerabilities. But closed models are expensive, costing around $2.4K–$3.1K per run. Swapping in GLM-5, an open-weight model served by FriendliAI found 35 vulnerabilities(85% of the Gemini result) for just $392. By comparison, pure fuzzing alone burned through $3,264 in compute to surface only 8 bugs. On a cost‑per‑bug basis, GLM‑5 delivered results at roughly one‑seventh the cost of closed models.

Full research and findings: https://team-atlanta.github.io/blog/post-sinkfuzz-glm/

Getting started

You can experiment with GLM-5 in minutes:

Create an API key – Sign up at friendli.ai and generate your key
Select GLM-5 – Choose the model amongst Serverless Models and test it on the Playground
Integrate – Use the provided code snippets and monitor performance in the dashboard

The bigger picture

Our collaboration with leading research groups shows that open-weight models can deliver competitive results on complex security tasks. With better input generation, GLM-5 uncovered far more vulnerabilities than pure fuzzing, while costing about one-seventh as much as closed models.

FriendliAI makes this practical at scale. Its serverless inference platform is built for high-speed research, delivering the low latency and high throughput that large models like GLM-5 demand without any infrastructure setup. Researchers can spin up experiments instantly, push more queries through in a given time window, and iterate on hypotheses rapidly.

FriendliAI bridges open-weight models and production-grade serverless inference, making state-of-the-art vulnerability discovery accessible to every security team.

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is the Frontier Inference Cloud for Agents, delivering high throughput, low latency, and reliability at scale for agentic workloads. Through vertically optimized inference infrastructure, it delivers 2–5× faster output token speed and a 99.99% uptime SLA for high-volume production traffic.

How does FriendliAI reduce inference costs?

FriendliAI reduces inference costs through higher GPU utilization and optimized inference performance. FriendliAI's patented continuous batching technique, along with quantization, speculative decoding, KV cache offloading, multi-LoRA serving, and autoscaling, helps you serve more tokens with fewer GPUs, lowering your infrastructure costs without sacrificing performance.

Why should I choose FriendliAI over other inference providers?

FriendliAI is built for production AI agents, combining speed, reliability, and efficiency at scale. It delivers low-latency streaming, reliable long-context inference, and robust tool calling without compromising stability. According to independent OpenRouter benchmarks, FriendliAI consistently ranks among the top providers for throughput, latency, and reliability across leading open-weight models. See why customers choose FriendliAI

Which open-weight models does FriendliAI support?

Run today’s frontier open-weight models—including GLM, MiniMax, Kimi, DeepSeek, Qwen, Gemma, and more—with a simple API call. FriendliAI Model API gives you instant access to the latest models with optimized inference performance for production workloads. Explore models and pricing

How do I get started?

Getting started takes just a few minutes. [1] Sign up for FriendliAI, [2] Generate your API key, and [3] Make your first inference request with frontier open-weight models.

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, support@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.

April 20, 2026
4 min read

GLM-5.1 on FriendliAI: The Long-Horizon Agentic Engineering Model at Peak Performance

GLM-5.1

Agentic Coding

Inference

April 29, 2026
5 min read

NVIDIA Nemotron™ 3 Nano Omni, Day-0 on FriendliAI: Unified Multimodal Reasoning, at Peak Performance

NVIDIA

Nemotron

See all from blog