Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Recipe
- Method: DARE-TIES (custom CPU streaming merger),
weight 0.5,density 0.53,DAREx-q 0.75,seed 42. mlp.{gate,up,down}_projkept verbatim from base (passthrough); attention,linear_attn(Gated-DeltaNet/SSM), norms, embeddings/head merged.- Architecture:
Qwen3_5hybrid (48 linear-attn + 16 full-attn layers), multimodal, 27B, bf16. - Built on a single Apple M4 Pro (64 GB, no GPU) — CPU streaming merge, ~5 min.
Status
This is a v0 test artifact for validating the K-AI leaderboard submission pipeline.
It is expected to score near the Qwen/Qwen3.6-27B baseline (the donor differs from
base by < 0.3%); it is not a tuned contender.
Model provider
websfactory
Model tree
Base
Qwen/Qwen3.6-27B
Base
dnotitia/DNA3.0-27B
Merged
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information