September, 2025
New Model Family
Added support for the following new model families:- Qwen3NextForCausalLM (e.g.,
Qwen/Qwen3-Next-80B-A3B-Instruct
) - HunYuanDenseV1ForCausalLM (e.g.,
tencent/Hunyuan-MT-7B
) - ApertusForCausalLM (e.g.,
swiss-ai/Apertus-8B-Instruct-2509
) - SeedOssForCausalLM (e.g.,
ByteDance-Seed/Seed-OSS-36B-Instruct
)
Custom Chat Template Support
We now support custom chat formatting. You can paste or upload a custom Jinja template during endpoint creation. Read more4-Bit Online Quantization Support
We now support 4-bit online quantization. By enabling this feature, you can efficiently run models on smaller instances with negligible quality impact. Read moreModel Deprecation
We have deprecated the following serverless model.K-intelligence/Midm-2.0-Base-Instruct
Model Deprecation
We have deprecated the following serverless model.K-intelligence/Midm-2.0-Mini-Instruct
August, 2025
New Auto-Scaling Type ‘Request count’ Added
Enterprise plan users can now choose to scale their endpoints based on request count. Request count scaling strategy adjusts worker numbers according to total requests in the queue and in progress.Increased Output Token Limits for Reasoning Models
We have increased the output token limits for reasoning models on Serverless endpoints, allowing longer reasoning outputs to be generated.New Endpoint Feature ‘N-GRAM Speculative Decoding’
Users can now enable N-GRAM speculative decoding for their endpoints. For predictable tasks, this can deliver substantial performance gains. Read moreNew Model Family
Added support for the following new model family:- HyperCLOVAXForCausalLM (e.g.,
naver-hyperclovax/HyperCLOVAX-SEED-Think-14B
)
Model Release
We now support the following serverless models.Qwen/Qwen3-235B-A22B-Thinking-2507
Qwen/Qwen3-235B-A22B-Instruct-2507
skt/A.X-4.0
skt/A.X-3.1
naver-hyperclovax/HyperCLOVAX-SEED-Think-14B
July, 2025
New Model Family
Added support for the following new model families:- Dots1ForCausalLM (e.g.,
rednote-hilab/dots.llm1.inst
) - Glm4vForConditionalGeneration (e.g.,
zai-org/GLM-4.1V-9B-Thinking
) - KeyeForConditionalGeneration (e.g.,
Kwai-Keye/Keye-VL-8B-Preview
) - HunYuanMoEV1ForCausalLM (e.g.,
tencent/Hunyuan-A13B-Instruct
) - PhiMoEForCausalLM (e.g.,
microsoft/Phi-mini-MoE-instruct
) - MiniMaxM1ForCausalLM (e.g.,
MiniMaxAI/MiniMax-M1-80k
) - Ernie4_5_MoeForCausalLM (e.g.,
baidu/ERNIE-4.5-21B-A3B-Thinking
) - Ernie4_5_ForCausalLM (e.g.,
baidu/ERNIE-4.5-0.3B-PT
)