Scatter Lab scales Zeta to 1 billion monthly interactions with FriendliAI

Overview
Scatter Lab is the company behind Zeta, one of the conversational AI services with the highest user engagement in both Japan and Korea. As Zeta expanded its user base across multiple markets and evolved to run significantly larger models, Scatter Lab needed a production inference solution capable of sustaining real-time responsiveness at massive scale, without sacrificing cost efficiency.
They chose Friendli Container as their self-hosted inference layer to serve Zeta in production.

Challenges
Zeta's rapid growth across Korea, Japan, and the U.S. created a set of compounding infrastructure demands that baseline serving solutions could not meet. Scatter Lab faced:
- Explosive interaction volume requiring an inference stack capable of handling real-time responses at billion-scale monthly throughput
- Larger, more capable models Zeta's evolution to models with significantly more parameters than its predecessors put new pressure on latency and GPU efficiency
- Cost sustainability as a critical threshold, reaching breakeven required an inference platform that could deliver performance without runaway serving costs
- Multi-market reliability with users across Korea, Japan, and the U.S. expecting fast, consistent conversational experiences at all hours
Why FriendliAI
Scatter Lab selected Friendli Container for its ability to serve large generative AI models in production with industry-leading speed and cost-effectiveness, eliminating the need for custom serving optimization work.
Friendli Container enabled fully self-hosted, production-grade inference for Scatter Lab's own models, delivering exceptional inference speed that maintained real-time responsiveness even as model size scaled up significantly. That speed, combined with the cost efficiency of the platform, made it possible to serve a billion monthly interactions sustainably, a threshold that directly enabled Scatter Lab to reach breakeven.
Stability at scale was equally critical. With sustained high-volume traffic across multiple markets, Scatter Lab needed an inference layer that could hold up consistently without degradation. Friendli Container delivered on that front without requiring any internal serving optimization work. By eliminating that overhead entirely, Friendli Engine freed Scatter Lab's team to stay focused on what drives the product forward model development and user experience rather than the mechanics of serving infrastructure.

The Solution
Scatter Lab deployed Zeta's production inference stack on Friendli Container, a self-hosted solution that gave them full control over their serving environment while delivering the performance required for real-time conversational AI at scale.
At the core of the deployment, Friendli Container provided optimized inference for large-parameter generative models within Scatter Lab's own infrastructure, combining the control of a self-hosted environment with the performance typically associated with managed cloud services. Production-grade speed and stability were maintained consistently across more than one billion monthly interactions, a volume that demands an inference layer with virtually no tolerance for degradation or inconsistency.
Cost-efficient GPU utilization was equally central to the deployment's success. Serving at that scale is only sustainable if the economics work, and FriendliAI's optimized resource utilization gave Scatter Lab the cost structure needed to reach a financially sustainable operating model, a meaningful milestone for a consumer AI product operating across multiple markets.
Perhaps most valuable to Scatter Lab's team was what Friendli Container eliminated: the need for custom serving optimization. The stack was deployable and production-ready out of the box, requiring no internal tuning or infrastructure engineering to get to full performance. That meant engineering resources stayed focused on the model and product, not the plumbing beneath it.
Results
With Friendli Container, Scatter Lab transformed Zeta into one of the highest-scale conversational AI services in Asia and beyond:
| Monthly Interactions | 1 billion+ handled reliably in production |
| Markets Served | Korea, Japan, and the U.S. |
| Financial Milestone | Helped achieve growth, enabled by FriendliAI's cost efficiency |
| Serving Optimization | Eliminated internal optimization testing entirely |
“Friendli Container has been instrumental in scaling Zeta as it grew across Korea, Japan, and the U.S. While handling more than 1 billion monthly interactions, it gave us the speed, stability, and cost efficiency, and helped us achieve breakeven. Friendli Engine is an irreplaceable solution for generative AI serving, both in terms of speed and cost-effectiveness.”
Jongyoun Kim, CEO, Scatter Lab

Deploy Your Models with FriendliAI
FriendliAI helps enterprise teams and AI organizations turn foundation models into reliable production systems — with optimized inference, flexible deployment options, and the reliability that enterprise workloads demand.
Serve high-performance inference with FriendliAI.
Start Building Faster
Related Customer Stories
Learn how leading companies achieve unmatched performance, scalability, and reliability with FriendliAI
Our custom model API went live in about a day with enterprise-grade monitoring built in.
Rock-solid reliability with ultra-low tail latency.
Scale to trillions of tokens with 50% fewer GPUs, thanks to FriendliAI.
Fluctuating traffic is no longer a concern because autoscaling just works.
Our custom model API went live in about a day with enterprise-grade monitoring built in.
Rock-solid reliability with ultra-low tail latency.
Scale to trillions of tokens with 50% fewer GPUs, thanks to FriendliAI.
Fluctuating traffic is no longer a concern because autoscaling just works.



