• April 23, 2026
  • 2 min read

Vulnerability Discovery with Open-Weight GLM-5: Frontier Quality at 1/7 the Cost of Closed Models

TL;DR
  • Traditional fuzzing struggles to discover deep vulnerabilities because it cannot generate structured inputs.
  • By combining fuzzing with LLMs, researchers achieved significantly better results, with GLM-5 reaching near-proprietary performance.
  • Running on FriendliAI, these results come at roughly one-seventh the cost, making advanced vulnerability discovery more accessible.
Vulnerability Discovery with Open-Weight GLM-5: Frontier Quality at 1/7 the Cost of Closed Models thumbnail

Finding security flaws before attackers strike is critical and difficult. Traditional fuzzers generate huge volumes of random inputs to test program behavior, but this brute-force approach surfaces only shallow bugs and misses the deeper ones. Real vulnerabilities go undetected. Finding such exploits requires structurally valid inputs, precise XML structures, or specific path patterns that are impossible to create with random mutation alone. This is where large language models can help.

In this post, we show how LLM-guided fuzzing dramatically improves vulnerability discovery. Through collaborations with leading security research groups, we demonstrate that open-weight models like GLM-5 running on FriendliAI can match closed-model results at roughly 1/7 of the cost.

Where fuzzing falls short and how LLMs help

Coverage-guided fuzzers, like Jazzer, a widely used fuzzing tool, mutate inputs randomly and track which branches they hit. Team Atlanta, the 2025 DARPA AI Cyber Challenge (AIxCC) winner, evaluated this approach on a 54-vulnerability benchmark. Even when scaling compute from a small setup to a much larger one, both runs found exactly 8 bugs. Within minutes, additional compute stopped producing new results.

The limitation was not compute but input structure, as many vulnerabilities require structured inputs that random mutation cannot produce.

To overcome this, the research team built Gondar, which combines Jazzer with large language models. Instead of relying only on random mutation, it uses LLMs to analyse code and generate well‑structured test inputs. This combination enables the system to reach deeper execution paths and uncover bugs that fuzzing alone would miss.

GLM-5: More Bugs Found at a Fraction of the Cost

Adding LLM guidance significantly improves coverage. Gondar with the Gemini‑2.5‑Pro model found 41 vulnerabilities. But closed models are expensive, costing around $2.4K–$3.1K per run. Swapping in GLM-5, an open-weight model served by FriendliAI found 35 vulnerabilities(85% of the Gemini result) for just $392. By comparison, pure fuzzing alone burned through $3,264 in compute to surface only 8 bugs. On a cost‑per‑bug basis, GLM‑5 delivered results at roughly one‑seventh the cost of closed models.

Full research and findings: https://team-atlanta.github.io/blog/post-sinkfuzz-glm/

Getting started

You can experiment with GLM-5 in minutes:

  1. Create an API key – Sign up at friendli.ai and generate your key
  2. Select GLM-5 – Choose the model amongst Serverless Models and test it on the Playground
  3. Integrate – Use the provided code snippets and monitor performance in the dashboard

The bigger picture

Our collaboration with leading research groups shows that open-weight models can deliver competitive results on complex security tasks. With better input generation, GLM-5 uncovered far more vulnerabilities than pure fuzzing, while costing about one-seventh as much as closed models.

FriendliAI makes this practical at scale. Its serverless inference platform is built for high-speed research, delivering the low latency and high throughput that large models like GLM-5 demand without any infrastructure setup. Researchers can spin up experiments instantly, push more queries through in a given time window, and iterate on hypotheses rapidly.

FriendliAI bridges open-weight models and production-grade serverless inference, making state-of-the-art vulnerability discovery accessible to every security team.


Written by

FriendliAI Tech & Research


Share


General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 540,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.


Explore FriendliAI today