Skip to main content
Stay up to date with the latest from FriendliAI — new models, new features, pricing updates, and more. Use the filters to see updates by product or update type.
June 16, 2026
Model APIsNew Model

Z.ai Model Added

The following Model APIs model is now available:
  • zai-org/GLM-5.2
To learn more, see Models > GLM-5.2.
May 22, 2026
Model APIsDeprecated Model

DeepSeek Model Deprecated

The following Model APIs model is no longer available:
  • deepseek-ai/DeepSeek-V3.1
To browse all models, see Models > Model APIs.
April 16, 2026
Model APIsDeprecated Model

Qwen Model Deprecated

The following Model APIs model is no longer available:
  • Qwen/Qwen3-30B-A3B
To browse all models, see Models > Model APIs.
April 15, 2026
Model APIsDeprecated Model

Z.ai Model Deprecated

The following Model APIs model is no longer available:
  • zai-org/GLM-4.7
To browse all models, see Models > Model APIs.
April 9, 2026
Model APIsDeprecated Model

MiniMax Model Deprecated

The following Model APIs model is no longer available:
  • MiniMaxAI/MiniMax-M2.1
To browse all models, see Models > Model APIs.
April 7, 2026
Model APIsNew Model

Z.ai Model Added

The following Model APIs model is now available:
  • zai-org/GLM-5.1
To learn more, see Models > GLM-5.1.
April 3, 2026
Model APIsDedicated EndpointsNew Model FamilyPricing Update

Cohere Labs Model Family Added

The Dedicated Endpoints CohereAsrForConditionalGeneration model family is now available. For example, you can now deploy endpoints for the following model:
  • CohereLabs/cohere-transcribe-03-2026
To browse all models, see Models > Dedicated Endpoints.

Google Model Family Added

The Dedicated Endpoints Gemma4ForConditionalGeneration model family is now available. For example, you can now deploy endpoints for the following model:
  • google/gemma-4-31B-it
To browse all models, see Models > Dedicated Endpoints.

DeepSeek and Qwen Model Pricing Changed

The pricing model for the following Model APIs models is now token-based:
  • deepseek-ai/DeepSeek-V3.1
  • Qwen/Qwen3-30B-A3B
To learn more, see Pricing > Model APIs.
April 1, 2026
Model APIsNew Model

OpenAI Model Added

The following Model APIs model is now available:
  • openai/whisper-large-v3
To learn more, see Models > whisper-large-v3.
March 18, 2026
Model APIsNew ModelPricing Update

LG AI Research Model Added

The following Model APIs model is now available:
  • LGAI-EXAONE/K-EXAONE-236B-A23B
To learn more, see Models > K-EXAONE-236B-A23B.

LG AI Research Model Pricing Changed

Cached input pricing is now available for the following Model APIs model:
  • LGAI-EXAONE/K-EXAONE-236B-A23B
To learn more, see Pricing > Model APIs.
March 7, 2026
Model APIsNew ModelPricing Update

DeepSeek Model Added

The following Model APIs model is now available:
  • deepseek-ai/DeepSeek-V3.2
To learn more, see Models > DeepSeek-V3.2.

MiniMax and Z.ai Model Pricing Changed

Cached input pricing is now available for the following Model APIs models:
  • MiniMaxAI/MiniMax-M2.1
  • zai-org/GLM-5
To learn more, see Pricing > Model APIs.
March 4, 2026
Dedicated EndpointsNew Feature

Host KV Cache Added

Host KV Cache is now available for Dedicated Endpoints. It offloads KV cache to host memory to extend capacity beyond GPU limits. Your endpoint retains more tokens during inference.To learn more, see Endpoints > Host KV Cache.

Draft-Model Speculative Decoding Added

Draft-model speculative decoding is now available for Dedicated Endpoints, for a curated list of target models. You can pair a target model with a draft model that proposes multiple tokens for the target to verify in parallel, which improves throughput and latency.To learn more, see Speculative Decoding > Draft-Model Method.
March 1, 2026
Model APIsPricing Update

MiniMax Model Pricing Changed

Cached input pricing is now available for the following Model APIs model:
  • MiniMaxAI/MiniMax-M2.5
To learn more, see Pricing > Model APIs.
February 28, 2026
Model APIsDeprecated Model

LG AI Research Model Deprecated

The following Model APIs model is no longer available:
  • LGAI-EXAONE/EXAONE-4.0.1-32B
To browse all models, see Models > Model APIs.
February 20, 2026
Model APIsDeprecated Model

LG AI Research Model Deprecated

The following Model APIs model is no longer available:
  • LGAI-EXAONE/K-EXAONE-236B-A23B
To browse all models, see Models > Model APIs.
February 19, 2026
Model APIsNew Model

MiniMax Model Added

The following Model APIs model is now available:
  • MiniMaxAI/MiniMax-M2.5
To learn more, see Models > MiniMax-M2.5.
February 11, 2026
Model APIsNew Model

Z.ai Model Added

The following Model APIs model is now available:
  • zai-org/GLM-5
To learn more, see Models > GLM-5.
January 21, 2026
Dedicated EndpointsNew Model Family

Z.ai Model Family Added

The Dedicated Endpoints Glm4MoeLiteForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • zai-org/GLM-4.7-Flash
To browse all models, see Models > Dedicated Endpoints.
January 20, 2026
Model APIsNew ModelPricing Update

Z.ai Model Added

The following Model APIs model is now available:
  • zai-org/GLM-4.7
To learn more, see Models > GLM-4.7.

MiniMax Model Pricing Changed

The pricing model for the following Model APIs model is now token-based:
  • MiniMaxAI/MiniMax-M2.1
To learn more, see Pricing > Model APIs.
January 16, 2026
Model APIsDeprecated Model

DeepSeek Model Deprecated

The following Model APIs model is no longer available:
  • deepseek-ai/DeepSeek-R1-0528
To browse all models, see Models > Model APIs.
January 14, 2026
Model APIsNew Model

MiniMax Model Added

The following Model APIs model is now available:
  • MiniMaxAI/MiniMax-M2.1
To learn more, see Models > MiniMax-M2.1.
January 2, 2026
Dedicated EndpointsNew Model Family

LG AI Research Model Family Added

The Dedicated Endpoints ExaoneMoEForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • LGAI-EXAONE/K-EXAONE-236B-A23B
To browse all models, see Models > Dedicated Endpoints.
December 31, 2025
Model APIsNew Model

LG AI Research Model Added

The following Model APIs model is now available:
  • LGAI-EXAONE/K-EXAONE-236B-A23B
To learn more, see Models > K-EXAONE-236B-A23B.
December 5, 2025
Dedicated EndpointsNew Model Family

Tencent Model Family Added

The Dedicated Endpoints HunYuanVLForConditionalGeneration model family is now available. For example, you can now deploy endpoints for the following model:
  • tencent/HunyuanOCR
To browse all models, see Models > Dedicated Endpoints.

MiniMax Model Family Added

The Dedicated Endpoints MiniMaxM2ForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • MiniMaxAI/MiniMax-M2
To browse all models, see Models > Dedicated Endpoints.

Google Model Family Added

The Dedicated Endpoints Gemma3TextModel model family is now available. For example, you can now deploy endpoints for the following model:
  • google/embeddinggemma-300m
To browse all models, see Models > Dedicated Endpoints.

Microsoft Model Family Added

The Dedicated Endpoints Phi4MMForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • microsoft/Phi-4-multimodal-instruct
To browse all models, see Models > Dedicated Endpoints.
December 1, 2025
Model APIsNew Model

DeepSeek Model Added

The following Model APIs model is now available:
  • deepseek-ai/DeepSeek-V3.1
To learn more, see Models > DeepSeek-V3.1.
November 27, 2025
Dedicated EndpointsImprovement

Basic Plan Expanded

The Dedicated Endpoints Basic plan now includes the following features:
  • Scale replicas by queued and in-flight requests with a request count scaling policy.
  • Serve multiple LoRA adapters on a single endpoint.
  • Monitor endpoint performance with metrics.
  • Inspect endpoint activity in real time with logs.
These features were previously only available on the Enterprise plan.
November 21, 2025
Dedicated EndpointsNew Model Family

Black Forest Labs Model Family Added

The Dedicated Endpoints FluxKontextPipeline model family is now available. For example, you can now deploy endpoints for the following model:
  • black-forest-labs/FLUX.1-Kontext-dev
To browse all models, see Models > Dedicated Endpoints.

Ai2 Model Family Added

The Dedicated Endpoints Olmo3ForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • allenai/Olmo-3-32B-Think
To browse all models, see Models > Dedicated Endpoints.

LightOn Model Family Added

The Dedicated Endpoints LightOnOCRForConditionalGeneration model family is now available. For example, you can now deploy endpoints for the following model:
  • lightonai/LightOnOCR-1B-1025
To browse all models, see Models > Dedicated Endpoints.

PaddlePaddle Model Family Added

The Dedicated Endpoints PaddleOCRVLForConditionalGeneration model family is now available. For example, you can now deploy endpoints for the following model:
  • PaddlePaddle/PaddleOCR-VL
To browse all models, see Models > Dedicated Endpoints.

DeepSeek Model Family Added

The Dedicated Endpoints DeepseekOCRForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • deepseek-ai/DeepSeek-OCR
To browse all models, see Models > Dedicated Endpoints.
November 7, 2025
Dedicated EndpointsNew Model Family

Qwen Model Families Added

The Dedicated Endpoints Qwen3VLForConditionalGeneration and Qwen3VLMoeForConditionalGeneration model families are now available. For example, you can now deploy endpoints for the following models:
  • Qwen/Qwen3-VL-4B-Instruct
  • Qwen/Qwen3-VL-30B-A3B-Instruct
To browse all models, see Models > Dedicated Endpoints.

IBM Granite Model Family Added

The Dedicated Endpoints GraniteMoeHybridForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • ibm-granite/granite-4.0-h-small
To browse all models, see Models > Dedicated Endpoints.

rednote-hilab Model Family Added

The Dedicated Endpoints DotsOCRForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • rednote-hilab/dots.ocr
To browse all models, see Models > Dedicated Endpoints.
November 1, 2025
Model APIsPricing Update

Qwen Model Pricing Changed

The pricing model for the following Model APIs model is now token-based:
  • Qwen/Qwen3-235B-A22B-Instruct-2507
To learn more, see Pricing > Model APIs.
September 15, 2025
Dedicated EndpointsNew Model Family

Qwen Model Family Added

The Dedicated Endpoints Qwen3NextForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • Qwen/Qwen3-Next-80B-A3B-Instruct
To browse all models, see Models > Dedicated Endpoints.

Tencent Model Family Added

The Dedicated Endpoints HunYuanDenseV1ForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • tencent/Hunyuan-MT-7B
To browse all models, see Models > Dedicated Endpoints.

Swiss AI Initiative Model Family Added

The Dedicated Endpoints ApertusForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • swiss-ai/Apertus-8B-Instruct-2509
To browse all models, see Models > Dedicated Endpoints.

ByteDance Seed Model Family Added

The Dedicated Endpoints SeedOssForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • ByteDance-Seed/Seed-OSS-36B-Instruct
To browse all models, see Models > Dedicated Endpoints.
September 12, 2025
Dedicated EndpointsNew Feature

Custom Chat Templates Added

Custom chat templates are now available for Dedicated Endpoints. You can paste or upload a Jinja template when you create an endpoint.To learn more, see Endpoints > Custom Chat Templates.

4-Bit Online Quantization Added

Online quantization with 4-bit precision is now available for Dedicated Endpoints. You can run models on smaller instances with negligible quality impact.To learn more, see Online Quantization.
September 10, 2025
Model APIsDedicated EndpointsNew Feature

Reasoning Parser Added

The reasoning parser is now available for Model APIs and Dedicated Endpoints. When you turn it on, responses return reasoning in its own field, separate from the message content.To learn more, see Reasoning > Reasoning Parser.
September 8, 2025
Model APIsDeprecated Model

K-intelligence Model Deprecated

The following Model APIs model is no longer available:
  • K-intelligence/Midm-2.0-Base-Instruct
To browse all models, see Models > Model APIs.
September 4, 2025
Model APIsDeprecated Model

K-intelligence Model Deprecated

The following Model APIs model is no longer available:
  • K-intelligence/Midm-2.0-Mini-Instruct
To browse all models, see Models > Model APIs.
September 1, 2025
Dedicated EndpointsNew Feature

NVIDIA B200 GPUs Added

NVIDIA B200 GPUs are now available for Dedicated Endpoints, alongside A100, H100, and H200 GPUs. You can select B200 instances when you create an endpoint.To learn more, see Pricing > Dedicated Endpoints.
August 22, 2025
Model APIsDedicated EndpointsNew Model FamilyNew Feature

OpenAI Model Family Added

The Dedicated Endpoints GptOssForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • openai/gpt-oss-20b
To browse all models, see Models > Dedicated Endpoints.

Linkup Web Search Integration Added

Linkup web search is now available for Model APIs as a built-in tool. You can use it to ground model responses with live web results.To learn more, see Partnering with Linkup: Built‑in AI Web Search in Friendli Serverless Endpoints.
August 19, 2025
Dedicated EndpointsNew Feature

Request Count Scaling Policy Added

A request count scaling policy is now available for Dedicated Endpoints on the Enterprise plan. You can scale replicas according to queued and in-flight requests.To learn more, see Autoscaling > Scaling Policies.
August 8, 2025
Model APIsDedicated EndpointsNew FeatureImprovement

N-Gram Speculative Decoding Added

N-gram speculative decoding is now available for Dedicated Endpoints. You can turn it on for predictable tasks, where it delivers substantial performance gains.To learn more, see Introducing N-gram Speculative Decoding: Faster Inference for Structured Tasks.

Reasoning Output Token Limits Raised

Output token limits are now higher for Model APIs reasoning models. You can run demanding tasks without response truncation.To learn more, see Chat Completions API.
August 1, 2025
Model APIsDedicated EndpointsNew Model FamilyNew Model

HyperCLOVA X Model Family Added

The Dedicated Endpoints HyperCLOVAXForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • naver-hyperclovax/HyperCLOVAX-SEED-Think-14B
To browse all models, see Models > Dedicated Endpoints.

Qwen Model Added

The following Model APIs model is now available:
  • Qwen/Qwen3-235B-A22B-Instruct-2507
To learn more, see Models > Qwen3-235B-A22B-Instruct-2507.
July 25, 2025
Dedicated EndpointsNew Feature

Online Quantization Added

Online quantization is now available for Dedicated Endpoints. You can quantize models with no advance preparation and accelerate inference.To learn more, see Announcing Online Quantization: Faster, Cheaper Inference with Same Accuracy.
July 14, 2025
Model APIsNew Model

LG AI Research Model Added

The following Model APIs model is now available:
  • LGAI-EXAONE/EXAONE-4.0.1-32B
To learn more, see LG AI Research Partners with FriendliAI to Launch EXAONE 4.0 for Fast, Scalable API.
July 11, 2025
Model APIsNew Model

DeepSeek Model Added

The following Model APIs model is now available:
  • deepseek-ai/DeepSeek-R1-0528
To learn more, see Models > DeepSeek-R1-0528.
July 8, 2025
Dedicated EndpointsNew Model Family

rednote-hilab Model Family Added

The Dedicated Endpoints Dots1ForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • rednote-hilab/dots.llm1.inst
To browse all models, see Models > Dedicated Endpoints.

Z.ai Model Family Added

The Dedicated Endpoints Glm4vForConditionalGeneration model family is now available. For example, you can now deploy endpoints for the following model:
  • zai-org/GLM-4.1V-9B-Thinking
To browse all models, see Models > Dedicated Endpoints.

Kwai Keye Model Family Added

The Dedicated Endpoints KeyeForConditionalGeneration model family is now available. For example, you can now deploy endpoints for the following model:
  • Kwai-Keye/Keye-VL-8B-Preview
To browse all models, see Models > Dedicated Endpoints.

Tencent Model Family Added

The Dedicated Endpoints HunYuanMoEV1ForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • tencent/Hunyuan-A13B-Instruct
To browse all models, see Models > Dedicated Endpoints.

Microsoft Model Family Added

The Dedicated Endpoints PhiMoEForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • microsoft/Phi-mini-MoE-instruct
To browse all models, see Models > Dedicated Endpoints.

MiniMax Model Family Added

The Dedicated Endpoints MiniMaxM1ForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • MiniMaxAI/MiniMax-M1-80k
To browse all models, see Models > Dedicated Endpoints.

Baidu Model Families Added

The Dedicated Endpoints Ernie4_5_MoeForCausalLM and Ernie4_5_ForCausalLM model families are now available. For example, you can now deploy endpoints for the following models:
  • baidu/ERNIE-4.5-21B-A3B-Thinking
  • baidu/ERNIE-4.5-0.3B-PT
To browse all models, see Models > Dedicated Endpoints.
July 3, 2025
Dedicated EndpointsNew Model Family

LG AI Research Model Family Added

The Dedicated Endpoints Exaone4ForCausalLM model family is now available. For example, you can now deploy endpoints for the following model:
  • LGAI-EXAONE/EXAONE-4.0.1-32B
To browse all models, see Models > Dedicated Endpoints.
Last modified on June 18, 2026