FINAL-Bench

Darwin-9B-Opus

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Technical Definitions

Table
TermDefinitionMeasurement
Model MRILayer-level profiling of tensor health indicatorsL2 norm, Shannon entropy, std per tensor across all layers
LayerMRI.compare_layersPer-tensor A vs B quality comparison yielding optimal ratio_bscore = entropy * 0.5 + std * 0.3 + clamp(norm, 100) * 0.002 per model; ratio_b = score_b / (score_a + score_b)
MRI-Guided MergePer-tensor merge ratios derived from parent diagnostics (70% MRI + 30% genome)final_ratio = mri_ratio * 0.7 + genome_ratio * 0.3
DARE-TIESMerge algorithm: random binary mask on delta, then weighted additionmerged = A + (B - A) * random_mask(density) * ratio
Transplant A / BWhen MRI ratio falls below 0.05 or above 0.95, one parent is used entirelyNo interpolation — direct tensor copy
Evolutionary SearchCMA-ES population evolution over genome space (ratio, attn, ffn, embed, density_a, density_b)Phase 1: 200 steps heuristic proxy, Phase 2: 10 steps real benchmark

Overview

Darwin-9B-Opus is a 9B dense parameter reasoning model created using Darwin V5. Both parent models share the identical Qwen3.5-9B architecture — the Mother is a LoRA SFT on the same base, not a different architecture.

Table
RoleModelTraining
FatherQwen/Qwen3.5-9BOriginal pre-training + RLHF
MotherJackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-DistilledLoRA SFT with text-only Claude 4.6 Opus reasoning chains

How Darwin V5 Works

Darwin V5 does not use mergekit or any external merge library. It implements DARE-TIES merge directly via PyTorch tensor operations, with MRI-guided per-layer ratios. The algorithm is inspired by the DARE-TIES method but re-implemented from scratch to support per-tensor diagnostic-guided ratios.

Merge Implementation (actual code logic)

python

# For each tensor pair (A, B) across all safetensor shards:
ta = model_a[key] # Father tensor
tb = model_b[key] # Mother tensor
# 1. MRI diagnoses both tensors
diag_a = LayerMRI.diagnose_tensor(ta) # {norm, entropy, std}
diag_b = LayerMRI.diagnose_tensor(tb) # {norm, entropy, std}
# 2. Quality score comparison determines ratio_b
score_a = diag_a["entropy"] * 0.5 + diag_a["std"] * 0.3 + min(diag_a["norm"], 100) * 0.002
score_b = diag_b["entropy"] * 0.5 + diag_b["std"] * 0.3 + min(diag_b["norm"], 100) * 0.002
mri_ratio = score_b / (score_a + score_b) # Higher = Mother is better
# 3. Final ratio = MRI 70% + evolutionary genome 30%
final_ratio = mri_ratio * 0.7 + genome_type_ratio * 0.3
# 4. DARE-TIES merge with per-tensor ratio
mask = torch.rand_like(tb) < density_b
delta = (tb - ta) * mask
merged = (ta + delta * final_ratio).bfloat16()

Pipeline

markdown

Phase 0: Model MRI
For every tensor in both parents, measure:
- L2 norm (layer energy)
- Shannon entropy (weight distribution uniformity)
- Standard deviation (activation spread)
Compare A vs B quality scores -> per-tensor ratio prescription
Phase 1: Evolutionary Search (200 steps, heuristic proxy)
Population of 20 genomes (ratio, attn, ffn, embed, density_a, density_b)
Fitness: heuristic score based on genome balance + differentiation
Selection -> SLERP crossover -> Gaussian mutation
Phase 2: Real Merge + Benchmark (10 steps)
Top genomes from Phase 1 undergo actual tensor merge
Each merge: MRI prescription (70%) + genome ratio (30%)
Fitness: real benchmark score (ARC-Challenge)
Best model selected and auto-uploaded
Phase 3: Health Check
Layer-by-layer importance comparison: child vs both parents
Detect interference (child >> parents) or function loss (parents >> child)

What Makes This Different from Standard Merging

Table
CapabilityStandard DARE-TIESDarwin V5
Implementationmergekit library callDirect PyTorch tensor operations
Ratio selectionUniform ratio across all tensorsPer-tensor ratio from MRI diagnosis
Pre-merge analysisNoneTensor-level norm/entropy/std profiling
Ratio determinationHuman-set or grid searchMRI 70% + evolutionary genome 30%
Post-merge validationBenchmark score onlyLayer-by-layer child vs parents comparison
Transplant supportNoratio < 0.05 -> use A entirely, ratio > 0.95 -> use B entirely
Failure diagnosis"Score went down"Per-tensor quality delta identifies problematic layers

Model Specifications

Table
ArchitectureQwen3.5 Dense (Gated DeltaNet hybrid)
Total Parameters9B
PrecisionBF16
Context Length131,072 native
Languages201
Thinking<think> tag chain-of-thought reasoning
LicenseApache 2.0

Hardware Requirements

Table
SetupVRAMStatus
BF16 Full Precision~20 GB
NVIDIA RTX 4090 24GB24 GBComfortable
NVIDIA A100 40GB40 GBVery comfortable
NVIDIA T4 16GB16 GBRequires quantization

Usage

Transformers

python

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained(
"FINAL-Bench/Darwin-9B-Opus",
trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
"FINAL-Bench/Darwin-9B-Opus",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [{"role": "user", "content": "Prove that sqrt(2) is irrational."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

SGLang

bash

python -m sglang.launch_server \
--model-path FINAL-Bench/Darwin-9B-Opus \
--tp 1 \
--mem-fraction-static 0.90 \
--context-length 32768 \
--trust-remote-code

vLLM

bash

vllm serve FINAL-Bench/Darwin-9B-Opus \
--trust-remote-code \
--enforce-eager

Evolution Details

Table
EngineDarwin V5 (Evolutionary Merge + Layer-Level Diagnostics)
Merge MethodDARE-TIES (direct PyTorch implementation, no external library)
MRI IntegrationPer-tensor diagnosis: norm, entropy, std -> ratio prescription
Ratio Formulafinal_ratio = mri_ratio * 0.7 + genome_ratio * 0.3
EvolutionPhase 1: 200 steps proxy + Phase 2: 10 steps real benchmark
Best Score0.8508 (ARC-Challenge)
Infrastructure4 x NVIDIA H100 NVL (100GB each)

Acknowledgements

  • Korean Government — GPU Support Program research grant
  • Qwen Team — Qwen3.5 base architecture
  • Jackrong — Claude 4.6 Opus Reasoning Distilled model
  • DARE-TIES algorithm — Yadav et al., 2023 (re-implemented, not library-dependent)

Built By

Table
DeveloperVIDRAFT
EngineDarwin V5
Base ArchitectureQwen3.5-9B

Citation

bibtex

@misc{vidraft_darwin_9b_opus,
title = {Darwin-9B-Opus: Diagnostic-Guided Evolutionary Merge},
author = {VIDRAFT},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-9B-Opus}}
}

This model is introduced in Darwin Family.

Model provider

FINAL-Bench

Model tree

Base

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today