• December 31, 2025
  • 2 min read

K-EXAONE Is Now Available on Friendli Serverless Endpoints

K-EXAONE Is Now Available on Friendli Serverless Endpoints thumbnail

We’re excited to announce that K-EXAONE is now available on Friendli Serverless Endpoints, enabling developers and organizations to deploy and scale the LG AI Research’s new large-scale Hybrid Attention Mixture-of-Experts (MoE) model with ease.

Continuing our partnership, FriendliAI is yet again bringing K-EXAONE to production-ready environments with optimized inference, seamless APIs from day zero for free for a month, so anyone can try it out and start building right away.

About K-EXAONE

LGAI-EXAONE/K-EXAONE-236B-A23B is a 236B-parameter Hybrid Attention Mixture-of-Experts (MoE) model built for advanced reasoning, long-context understanding, and complex generative tasks. By routing data to specialized experts and combining this with selective full-attention layers, it delivers high efficiency without sacrificing performance.

The model excels at tasks requiring sustained reasoning and large-context comprehension, making it ideal for enterprise knowledge systems, multi-step workflows, and other applications that demand both depth and adaptability. Its mixture-of-experts design activates only the most relevant experts per token, while selective full-attention layers ensure key global context is captured.

In benchmark tests, K-EXAONE outperforms Qwen/Qwen3-235B-A22B-Thinking-2507 and openai/gpt-oss-120b, showing notable gains in reasoning and long-context tasks.

You can check out its detailed architecture specifications at https://huggingface.co/LGAI-EXAONE/K-EXAONE-236B-A23B. To highlight what sets K-EXAONE apart, the table below compares its key architectural characteristics with Qwen3-235B:

K-EXAONEQwen3 235B
num_hidden_layers4894
num_key_value_heads84
sliding_windowO
(with full attention at every 4th layer)
X
num_experts128 + 1128
num_experts_per_tok8 + 18

Access K-EXAONE on Friendli Serverless Endpoints

With K-EXAONE on Friendli Serverless Endpoints, users can access the model through a fully managed, API-first experience:

  • Zero infrastructure management, no GPU provisioning or tuning
  • Automatic scaling, built for real-world traffic patterns
  • Optimized inference, tuned for performance and cost efficiency
  • OpenAI-compatible APIs, integrate quickly with existing stacks

Friendli Serverless Endpoints make it easy to take K-EXAONE from evaluation to production without operational overhead. Click here to access instantly.

Day-0 Support & Free Event

As part of this launch, FriendliAI will serve as the exclusive Day-0 support provider for K-EXAONE on Friendli Serverless Endpoints.

To help teams get started smoothly:

  • Serverless Endpoint support will be provided free of charge for the first month, through January 28, 2026 PT
  • Our team will assist with onboarding, deployment guidance, and performance optimization

This ensures developers can explore K-EXAONE’s capabilities confidently while building production-grade applications.

Start Building with K-EXAONE Today

K-EXAONE is now accessible via Friendli Serverless Endpoints, offering a streamlined path to accessing one of the latest Hybrid Attention MoE models.

If you’re looking to evaluate, prototype, or scale K-EXAONE in production, FriendliAI provides the infrastructure and support to help you move fast without compromise.

Ready to try it out for free? Click here (limited offer until Jan 28 PT)!


Written by

FriendliAI Tech & Research


Share


General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.


Explore FriendliAI today