Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

OVERVIEW

OMNISENTER BASE 16B IS A DARWIN FAMILY EVOLVED MULTIMODAL MODEL — THE FIRST GENERATION OF THE OMNISENTER LINEAGE. PRODUCED BY FUSING THE REASONING CAPABILITIES OF QWEN3-8B INTO THE MULTIMODAL WORLD MODEL COSMOS3-NANO VIA PER-TENSOR MRI-TRUST FUSION.

THE MODEL PRESERVES ALL OF COSMOS3-NANO'S MULTIMODAL MODALITIES — VISION, AUDIO, VIDEO UNDERSTANDING AND GENERATION — WHILE BLENDING IN QWEN3-8B'S TEXT REASONING STRENGTHS.


PARENT MODELS

PARENTARCHITECTUREPARAMETERSROLE
NVIDIA/COSMOS3-NANOCOSMOS3FORCONDITIONALGENERATION~16BMULTIMODAL WORLD MODEL
QWEN/QWEN3-8BQWEN3FORCAUSALLM8BDENSE TEXT REASONING

MERGE SPECIFICATIONS

markdown

METHOD .................. DARWIN FAMILY MRI-TRUST FUSION
GENOME DENSITY (ρ_b) .... 0.5
MRI-TRUST COEFF (τ) ..... 0.4
TEXT TENSORS MERGED ..... 398
COSMOS EXTRAS PRESERVED . 399 (CROSS-ATTN, MOE TWINS, MODALITY)
TOTAL OUTPUT TENSORS .... 798
SHAPE MATCH RATE ........ 398/398 (100%)
MERGE TIME .............. 195S
MODEL SIZE .............. 29GB (BFLOAT16, 7 SHARDS)

ARCHITECTURE

markdown

OMNISENTER BASE 16B
├── TEXT BACKBONE (DARWIN-MERGED QWEN3)
│ ├── 36 TRANSFORMER LAYERS
│ ├── SELF-ATTN + MLP + NORMS PER LAYER
│ ├── EMBED_TOKENS (151,936 VOCAB)
│ └── LM_HEAD
├── CROSS-MODAL ATTENTION (FROM COSMOS3-NANO)
│ ├── ADD_Q/K/V_PROJ + TO_ADD_OUT PER LAYER
│ └── NORM_ADDED_Q/K PER LAYER
├── MOE GENERATION TWINS (FROM COSMOS3-NANO)
│ └── LAYERS.*.MLP_MOE_GEN.* + LAYERNORMS
├── VISION ENCODER
├── DIFFUSION TRANSFORMER (VIDEO/IMAGE GEN)
├── SOUND TOKENIZER
└── VAE

CAPABILITIES

markdown

TEXT REASONING ......... YES — ENHANCED VIA QWEN3-8B FUSION
VISION ................ YES — PRESERVED FROM COSMOS3-NANO
AUDIO ................. YES — PRESERVED FROM COSMOS3-NANO
VIDEO UNDERSTANDING ... YES — PRESERVED FROM COSMOS3-NANO
VIDEO GENERATION ...... YES — PRESERVED FROM COSMOS3-NANO
TOOL CALLING .......... BASE CAPABILITY — IMPROVEMENT VIA SFT (PLANNED)
AGENTIC BEHAVIOR ...... BASE CAPABILITY — IMPROVEMENT VIA SFT (PLANNED)
MUSIC GENERATION ...... NOT YET — ACESTEP INTEGRATION PLANNED (LINE 2)

USAGE

python

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"sovthpaw/OmniSenter-Base-16B",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("sovthpaw/OmniSenter-Base-16B")
messages = [{"role": "user", "content": "Hello, what can you do?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

TRAINING DATA (FOR FUTURE SFT)

markdown

HERMES REASONING TOOL USE ............. 5,000 CONVERSATIONS
AURETH SFT CURRICULUM ................. 5,000 CONVERSATIONS
HERMES AGENT TRACES ................... 3,679 CONVERSATIONS
HERMES FUNCTION CALLING + THINKING .... 3,570 CONVERSATIONS
HERMES FUNCTION CALLING V1 ............ 1,893 CONVERSATIONS
─────────────────────────────────────────────────────────────
TOTAL ................................ 34,142 CONVERSATIONS
ADDITIONAL: NEMOTRON, ATRPOPS, NOUS RESEARCH DATASETS

HARDWARE REQUIREMENTS

FORMATVRAMNOTES
BFLOAT16 (SAFETENSORS)~32GBFULL PRECISION, A100/2×3090
4-BIT QUANTIZED (QLORA)~8GBFOR FINE-TUNING
Q4_K_M GGUF~10GBINFERENCE ON SINGLE 3090

LINEAGE

markdown

COSMOS3-NANO ──┐
├── DARWIN MERGE ──► OMNISENTER BASE 16B (GEN-0)
QWEN3-8B ─────┘ │
├──► GEN-1 (EVOLVED, CMA-ES)
├──► GEN-2 (EVOLVED + SFT)
└──► ... CONTINUOUS EVOLUTION

CITATION

bibtex

@article{darwin2026family,
title={Darwin Family: Training-Free Evolutionary Model Merging},
author={Darwin Team},
journal={arXiv preprint arXiv:2605.14386},
year={2026}
}

ACKNOWLEDGMENTS

  • NVIDIA FOR COSMOS3-NANO
  • QWEN TEAM FOR QWEN3-8B
  • NOUS RESEARCH FOR HERMES AGENT TRAINING DATA AND INFRASTRUCTURE
  • THE DARWIN FAMILY PAPER AUTHORS FOR THE EVOLUTIONARY MERGING METHODOLOGY

TOWARDS SELF-IMPROVEMENT

NOUS RESEARCH

Model provider

sovthpaw

Model tree

Base

nvidia/Cosmos3-Nano

Base

Qwen/Qwen3-8B

Merged

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today