Z.ai Model Added
The following Model APIs model is now available:zai-org/GLM-5.2
DeepSeek Model Deprecated
The following Model APIs model is no longer available:deepseek-ai/DeepSeek-V3.1
Qwen Model Deprecated
The following Model APIs model is no longer available:Qwen/Qwen3-30B-A3B
Z.ai Model Deprecated
The following Model APIs model is no longer available:zai-org/GLM-4.7
MiniMax Model Deprecated
The following Model APIs model is no longer available:MiniMaxAI/MiniMax-M2.1
Z.ai Model Added
The following Model APIs model is now available:zai-org/GLM-5.1
Cohere Labs Model Family Added
The Dedicated EndpointsCohereAsrForConditionalGeneration model family is now available. For example, you can now deploy endpoints for the following model:CohereLabs/cohere-transcribe-03-2026
Google Model Family Added
The Dedicated EndpointsGemma4ForConditionalGeneration model family is now available. For example, you can now deploy endpoints for the following model:google/gemma-4-31B-it
DeepSeek and Qwen Model Pricing Changed
The pricing model for the following Model APIs models is now token-based:deepseek-ai/DeepSeek-V3.1Qwen/Qwen3-30B-A3B
OpenAI Model Added
The following Model APIs model is now available:openai/whisper-large-v3
LG AI Research Model Added
The following Model APIs model is now available:LGAI-EXAONE/K-EXAONE-236B-A23B
LG AI Research Model Pricing Changed
Cached input pricing is now available for the following Model APIs model:LGAI-EXAONE/K-EXAONE-236B-A23B
DeepSeek Model Added
The following Model APIs model is now available:deepseek-ai/DeepSeek-V3.2
MiniMax and Z.ai Model Pricing Changed
Cached input pricing is now available for the following Model APIs models:MiniMaxAI/MiniMax-M2.1zai-org/GLM-5
Host KV Cache Added
Host KV Cache is now available for Dedicated Endpoints. It offloads KV cache to host memory to extend capacity beyond GPU limits. Your endpoint retains more tokens during inference.To learn more, see Endpoints > Host KV Cache.Draft-Model Speculative Decoding Added
Draft-model speculative decoding is now available for Dedicated Endpoints, for a curated list of target models. You can pair a target model with a draft model that proposes multiple tokens for the target to verify in parallel, which improves throughput and latency.To learn more, see Speculative Decoding > Draft-Model Method.MiniMax Model Pricing Changed
Cached input pricing is now available for the following Model APIs model:MiniMaxAI/MiniMax-M2.5
LG AI Research Model Deprecated
The following Model APIs model is no longer available:LGAI-EXAONE/EXAONE-4.0.1-32B
LG AI Research Model Deprecated
The following Model APIs model is no longer available:LGAI-EXAONE/K-EXAONE-236B-A23B
MiniMax Model Added
The following Model APIs model is now available:MiniMaxAI/MiniMax-M2.5
Z.ai Model Added
The following Model APIs model is now available:zai-org/GLM-5
Z.ai Model Family Added
The Dedicated EndpointsGlm4MoeLiteForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:zai-org/GLM-4.7-Flash
Z.ai Model Added
The following Model APIs model is now available:zai-org/GLM-4.7
MiniMax Model Pricing Changed
The pricing model for the following Model APIs model is now token-based:MiniMaxAI/MiniMax-M2.1
DeepSeek Model Deprecated
The following Model APIs model is no longer available:deepseek-ai/DeepSeek-R1-0528
MiniMax Model Added
The following Model APIs model is now available:MiniMaxAI/MiniMax-M2.1
LG AI Research Model Family Added
The Dedicated EndpointsExaoneMoEForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:LGAI-EXAONE/K-EXAONE-236B-A23B
LG AI Research Model Added
The following Model APIs model is now available:LGAI-EXAONE/K-EXAONE-236B-A23B
Tencent Model Family Added
The Dedicated EndpointsHunYuanVLForConditionalGeneration model family is now available. For example, you can now deploy endpoints for the following model:tencent/HunyuanOCR
MiniMax Model Family Added
The Dedicated EndpointsMiniMaxM2ForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:MiniMaxAI/MiniMax-M2
Google Model Family Added
The Dedicated EndpointsGemma3TextModel model family is now available. For example, you can now deploy endpoints for the following model:google/embeddinggemma-300m
Microsoft Model Family Added
The Dedicated EndpointsPhi4MMForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:microsoft/Phi-4-multimodal-instruct
DeepSeek Model Added
The following Model APIs model is now available:deepseek-ai/DeepSeek-V3.1
Basic Plan Expanded
The Dedicated Endpoints Basic plan now includes the following features:- Scale replicas by queued and in-flight requests with a request count scaling policy.
- Serve multiple LoRA adapters on a single endpoint.
- Monitor endpoint performance with metrics.
- Inspect endpoint activity in real time with logs.
Black Forest Labs Model Family Added
The Dedicated EndpointsFluxKontextPipeline model family is now available. For example, you can now deploy endpoints for the following model:black-forest-labs/FLUX.1-Kontext-dev
Ai2 Model Family Added
The Dedicated EndpointsOlmo3ForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:allenai/Olmo-3-32B-Think
LightOn Model Family Added
The Dedicated EndpointsLightOnOCRForConditionalGeneration model family is now available. For example, you can now deploy endpoints for the following model:lightonai/LightOnOCR-1B-1025
PaddlePaddle Model Family Added
The Dedicated EndpointsPaddleOCRVLForConditionalGeneration model family is now available. For example, you can now deploy endpoints for the following model:PaddlePaddle/PaddleOCR-VL
DeepSeek Model Family Added
The Dedicated EndpointsDeepseekOCRForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:deepseek-ai/DeepSeek-OCR
Qwen Model Families Added
The Dedicated EndpointsQwen3VLForConditionalGeneration and Qwen3VLMoeForConditionalGeneration model families are now available. For example, you can now deploy endpoints for the following models:Qwen/Qwen3-VL-4B-InstructQwen/Qwen3-VL-30B-A3B-Instruct
IBM Granite Model Family Added
The Dedicated EndpointsGraniteMoeHybridForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:ibm-granite/granite-4.0-h-small
rednote-hilab Model Family Added
The Dedicated EndpointsDotsOCRForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:rednote-hilab/dots.ocr
Qwen Model Pricing Changed
The pricing model for the following Model APIs model is now token-based:Qwen/Qwen3-235B-A22B-Instruct-2507
Qwen Model Family Added
The Dedicated EndpointsQwen3NextForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:Qwen/Qwen3-Next-80B-A3B-Instruct
Tencent Model Family Added
The Dedicated EndpointsHunYuanDenseV1ForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:tencent/Hunyuan-MT-7B
Swiss AI Initiative Model Family Added
The Dedicated EndpointsApertusForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:swiss-ai/Apertus-8B-Instruct-2509
ByteDance Seed Model Family Added
The Dedicated EndpointsSeedOssForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:ByteDance-Seed/Seed-OSS-36B-Instruct
Custom Chat Templates Added
Custom chat templates are now available for Dedicated Endpoints. You can paste or upload a Jinja template when you create an endpoint.To learn more, see Endpoints > Custom Chat Templates.4-Bit Online Quantization Added
Online quantization with 4-bit precision is now available for Dedicated Endpoints. You can run models on smaller instances with negligible quality impact.To learn more, see Online Quantization.Reasoning Parser Added
The reasoning parser is now available for Model APIs and Dedicated Endpoints. When you turn it on, responses return reasoning in its own field, separate from the message content.To learn more, see Reasoning > Reasoning Parser.K-intelligence Model Deprecated
The following Model APIs model is no longer available:K-intelligence/Midm-2.0-Base-Instruct
K-intelligence Model Deprecated
The following Model APIs model is no longer available:K-intelligence/Midm-2.0-Mini-Instruct
NVIDIA B200 GPUs Added
NVIDIA B200 GPUs are now available for Dedicated Endpoints, alongside A100, H100, and H200 GPUs. You can select B200 instances when you create an endpoint.To learn more, see Pricing > Dedicated Endpoints.OpenAI Model Family Added
The Dedicated EndpointsGptOssForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:openai/gpt-oss-20b
Linkup Web Search Integration Added
Linkup web search is now available for Model APIs as a built-in tool. You can use it to ground model responses with live web results.To learn more, see Partnering with Linkup: Built‑in AI Web Search in Friendli Serverless Endpoints.Request Count Scaling Policy Added
A request count scaling policy is now available for Dedicated Endpoints on the Enterprise plan. You can scale replicas according to queued and in-flight requests.To learn more, see Autoscaling > Scaling Policies.N-Gram Speculative Decoding Added
N-gram speculative decoding is now available for Dedicated Endpoints. You can turn it on for predictable tasks, where it delivers substantial performance gains.To learn more, see Introducing N-gram Speculative Decoding: Faster Inference for Structured Tasks.Reasoning Output Token Limits Raised
Output token limits are now higher for Model APIs reasoning models. You can run demanding tasks without response truncation.To learn more, see Chat Completions API.HyperCLOVA X Model Family Added
The Dedicated EndpointsHyperCLOVAXForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:naver-hyperclovax/HyperCLOVAX-SEED-Think-14B
Qwen Model Added
The following Model APIs model is now available:Qwen/Qwen3-235B-A22B-Instruct-2507
Online Quantization Added
Online quantization is now available for Dedicated Endpoints. You can quantize models with no advance preparation and accelerate inference.To learn more, see Announcing Online Quantization: Faster, Cheaper Inference with Same Accuracy.LG AI Research Model Added
The following Model APIs model is now available:LGAI-EXAONE/EXAONE-4.0.1-32B
DeepSeek Model Added
The following Model APIs model is now available:deepseek-ai/DeepSeek-R1-0528
rednote-hilab Model Family Added
The Dedicated EndpointsDots1ForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:rednote-hilab/dots.llm1.inst
Z.ai Model Family Added
The Dedicated EndpointsGlm4vForConditionalGeneration model family is now available. For example, you can now deploy endpoints for the following model:zai-org/GLM-4.1V-9B-Thinking
Kwai Keye Model Family Added
The Dedicated EndpointsKeyeForConditionalGeneration model family is now available. For example, you can now deploy endpoints for the following model:Kwai-Keye/Keye-VL-8B-Preview
Tencent Model Family Added
The Dedicated EndpointsHunYuanMoEV1ForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:tencent/Hunyuan-A13B-Instruct
Microsoft Model Family Added
The Dedicated EndpointsPhiMoEForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:microsoft/Phi-mini-MoE-instruct
MiniMax Model Family Added
The Dedicated EndpointsMiniMaxM1ForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:MiniMaxAI/MiniMax-M1-80k
Baidu Model Families Added
The Dedicated EndpointsErnie4_5_MoeForCausalLM and Ernie4_5_ForCausalLM model families are now available. For example, you can now deploy endpoints for the following models:baidu/ERNIE-4.5-21B-A3B-Thinkingbaidu/ERNIE-4.5-0.3B-PT
LG AI Research Model Family Added
The Dedicated EndpointsExaone4ForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:LGAI-EXAONE/EXAONE-4.0.1-32B