Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Recipe

  • Method: DARE-TIES (custom CPU streaming merger), weight 0.5, density 0.53, DAREx-q 0.75, seed 42.
  • mlp.{gate,up,down}_proj kept verbatim from base (passthrough); attention, linear_attn (Gated-DeltaNet/SSM), norms, embeddings/head merged.
  • Architecture: Qwen3_5 hybrid (48 linear-attn + 16 full-attn layers), multimodal, 27B, bf16.
  • Built on a single Apple M4 Pro (64 GB, no GPU) — CPU streaming merge, ~5 min.

Status

This is a v0 test artifact for validating the K-AI leaderboard submission pipeline. It is expected to score near the Qwen/Qwen3.6-27B baseline (the donor differs from base by < 0.3%); it is not a tuned contender.

Model provider

websfactory

Model tree

Base

Qwen/Qwen3.6-27B

Base

dnotitia/DNA3.0-27B

Merged

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today