November, 2025
New Model Family
Added support for the following new model families:- Qwen3VLForConditionalGeneration (e.g.,
Qwen/Qwen3-VL-4B-Instruct) - Qwen3VLMoeForConditionalGeneration (e.g.,
Qwen/Qwen3-VL-30B-A3B-Instruct) - GraniteMoeHybridForCausalLM (e.g.,
ibm-granite/granite-4.0-h-small) - DotsOCRForCausalLM (e.g.,
rednote-hilab/dots.ocr)
Pricing Update
We have changed the pricing model forQwen/Qwen3-235B-A22B-Instruct-2507 to token-based pricing.October, 2025
September, 2025
New Model Family
Added support for the following new model families:- Qwen3NextForCausalLM (e.g.,
Qwen/Qwen3-Next-80B-A3B-Instruct) - HunYuanDenseV1ForCausalLM (e.g.,
tencent/Hunyuan-MT-7B) - ApertusForCausalLM (e.g.,
swiss-ai/Apertus-8B-Instruct-2509) - SeedOssForCausalLM (e.g.,
ByteDance-Seed/Seed-OSS-36B-Instruct)
Custom Chat Template Support
We now support custom chat formatting. You can paste or upload a custom Jinja template during endpoint creation. Read more4-Bit Online Quantization Support
We now support 4-bit online quantization. By enabling this feature, you can efficiently run models on smaller instances with negligible quality impact. Read moreModel Deprecation
We have deprecated the following serverless model.K-intelligence/Midm-2.0-Base-Instruct
Model Deprecation
We have deprecated the following serverless model.K-intelligence/Midm-2.0-Mini-Instruct
August, 2025
New Auto-Scaling Type ‘Request count’ Added
Enterprise plan users can now choose to scale their endpoints based on request count. Request count scaling strategy adjusts worker numbers according to total requests in the queue and in progress.Increased Output Token Limits for Reasoning Models
We have increased the output token limits for reasoning models on Serverless endpoints, allowing longer reasoning outputs to be generated.New Endpoint Feature ‘N-GRAM Speculative Decoding’
Users can now enable N-GRAM speculative decoding for their endpoints. For predictable tasks, this can deliver substantial performance gains. Read moreNew Model Family
Added support for the following new model family:- HyperCLOVAXForCausalLM (e.g.,
naver-hyperclovax/HyperCLOVAX-SEED-Think-14B)
Model Release
We now support the following serverless models.Qwen/Qwen3-235B-A22B-Thinking-2507Qwen/Qwen3-235B-A22B-Instruct-2507skt/A.X-4.0skt/A.X-3.1naver-hyperclovax/HyperCLOVAX-SEED-Think-14B
July, 2025
New Model Family
Added support for the following new model families:- Dots1ForCausalLM (e.g.,
rednote-hilab/dots.llm1.inst) - Glm4vForConditionalGeneration (e.g.,
zai-org/GLM-4.1V-9B-Thinking) - KeyeForConditionalGeneration (e.g.,
Kwai-Keye/Keye-VL-8B-Preview) - HunYuanMoEV1ForCausalLM (e.g.,
tencent/Hunyuan-A13B-Instruct) - PhiMoEForCausalLM (e.g.,
microsoft/Phi-mini-MoE-instruct) - MiniMaxM1ForCausalLM (e.g.,
MiniMaxAI/MiniMax-M1-80k) - Ernie4_5_MoeForCausalLM (e.g.,
baidu/ERNIE-4.5-21B-A3B-Thinking) - Ernie4_5_ForCausalLM (e.g.,
baidu/ERNIE-4.5-0.3B-PT)