September, 2025

Sep 15
Dedicated Endpoints

New Model Family

Added support for the following new model families:
  • Qwen3NextForCausalLM (e.g., Qwen/Qwen3-Next-80B-A3B-Instruct)
  • HunYuanDenseV1ForCausalLM (e.g., tencent/Hunyuan-MT-7B)
  • ApertusForCausalLM (e.g., swiss-ai/Apertus-8B-Instruct-2509)
  • SeedOssForCausalLM (e.g., ByteDance-Seed/Seed-OSS-36B-Instruct)
Sep 12
Dedicated Endpoints

Custom Chat Template Support

We now support custom chat formatting. You can paste or upload a custom Jinja template during endpoint creation. Read more

4-Bit Online Quantization Support

We now support 4-bit online quantization. By enabling this feature, you can efficiently run models on smaller instances with negligible quality impact. Read more
Sep 10
Serverless EndpointsDedicated Endpoints

Reasoning Parsing Support

We now support reasoning parsing. By enabling the feature, the response will provide a separate reasoning_content field rather than including the reasoning content in the content field. Read more
Sep 8
Serverless Endpoints

Model Deprecation

We have deprecated the following serverless model.
  • K-intelligence/Midm-2.0-Base-Instruct
Sep 4
Serverless Endpoints

Model Deprecation

We have deprecated the following serverless model.
  • K-intelligence/Midm-2.0-Mini-Instruct
Sep 1
Dedicated Endpoints

B200 Hardware Support

We now support NVIDIA B200 GPUs alongside existing A100, H100, and H200 GPUs. Read more

August, 2025

Aug 22
Serverless Endpoints

New built-in integration w/ Linkup

New built-in web-search tool integration with Linkup has been added. Read more

New Model Family

Added support for the following new model family:
  • GptOssForCausalLM (e.g., openai/gpt-oss-20b )
Aug 19
Dedicated Endpoints

New Auto-Scaling Type ‘Request count’ Added

Enterprise plan users can now choose to scale their endpoints based on request count. Request count scaling strategy adjusts worker numbers according to total requests in the queue and in progress.
Aug 8
Serverless Endpoints

Increased Output Token Limits for Reasoning Models

We have increased the output token limits for reasoning models on Serverless endpoints, allowing longer reasoning outputs to be generated.

New Endpoint Feature ‘N-GRAM Speculative Decoding’

Users can now enable N-GRAM speculative decoding for their endpoints. For predictable tasks, this can deliver substantial performance gains. Read more
Aug 1
Dedicated Endpoints

New Model Family

Added support for the following new model family:
  • HyperCLOVAXForCausalLM (e.g., naver-hyperclovax/HyperCLOVAX-SEED-Think-14B )
Aug 1
Serverless Endpoints

Model Release

We now support the following serverless models.
  • Qwen/Qwen3-235B-A22B-Thinking-2507
  • Qwen/Qwen3-235B-A22B-Instruct-2507
  • skt/A.X-4.0
  • skt/A.X-3.1
  • naver-hyperclovax/HyperCLOVAX-SEED-Think-14B

July, 2025

Jul 25
Dedicated Endpoints

New Endpoint Feature ‘Online Quantization’

Users can now quantize their model endpoints without any preparations and accelerate inference. Read more
Jul 14
Serverless Endpoints

Model Release

LG AI Research has partnered with FriendliAI to bring the latest version of EXAONE 4.0. Read more
  • LGAI-EXAONE/EXAONE-4.0.1-32B
Jul 11
Serverless Endpoints

Model Release

We now support the following serverless model.
  • deepseek-ai/DeepSeek-R1-0528
Jul 8
Dedicated Endpoints

New Model Family

Added support for the following new model families:
  • Dots1ForCausalLM (e.g., rednote-hilab/dots.llm1.inst)
  • Glm4vForConditionalGeneration (e.g., zai-org/GLM-4.1V-9B-Thinking)
  • KeyeForConditionalGeneration (e.g., Kwai-Keye/Keye-VL-8B-Preview)
  • HunYuanMoEV1ForCausalLM (e.g., tencent/Hunyuan-A13B-Instruct)
  • PhiMoEForCausalLM (e.g., microsoft/Phi-mini-MoE-instruct)
  • MiniMaxM1ForCausalLM (e.g., MiniMaxAI/MiniMax-M1-80k)
  • Ernie4_5_MoeForCausalLM (e.g., baidu/ERNIE-4.5-21B-A3B-Thinking)
  • Ernie4_5_ForCausalLM (e.g., baidu/ERNIE-4.5-0.3B-PT)
Jul 3
Dedicated Endpoints

New Model Family

Added support for the following new model family:
  • Exaone4ForCausalLM (e.g., LGAI-EXAONE/EXAONE-4.0.1-32B )