Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0OVERVIEW
OMNISENTER BASE 16B IS A DARWIN FAMILY EVOLVED MULTIMODAL MODEL — THE FIRST GENERATION OF THE OMNISENTER LINEAGE. PRODUCED BY FUSING THE REASONING CAPABILITIES OF QWEN3-8B INTO THE MULTIMODAL WORLD MODEL COSMOS3-NANO VIA PER-TENSOR MRI-TRUST FUSION.
THE MODEL PRESERVES ALL OF COSMOS3-NANO'S MULTIMODAL MODALITIES — VISION, AUDIO, VIDEO UNDERSTANDING AND GENERATION — WHILE BLENDING IN QWEN3-8B'S TEXT REASONING STRENGTHS.
PARENT MODELS
| PARENT | ARCHITECTURE | PARAMETERS | ROLE |
|---|---|---|---|
| NVIDIA/COSMOS3-NANO | COSMOS3FORCONDITIONALGENERATION | ~16B | MULTIMODAL WORLD MODEL |
| QWEN/QWEN3-8B | QWEN3FORCAUSALLM | 8B | DENSE TEXT REASONING |
MERGE SPECIFICATIONS
markdown
METHOD .................. DARWIN FAMILY MRI-TRUST FUSIONGENOME DENSITY (ρ_b) .... 0.5MRI-TRUST COEFF (τ) ..... 0.4TEXT TENSORS MERGED ..... 398COSMOS EXTRAS PRESERVED . 399 (CROSS-ATTN, MOE TWINS, MODALITY)TOTAL OUTPUT TENSORS .... 798SHAPE MATCH RATE ........ 398/398 (100%)MERGE TIME .............. 195SMODEL SIZE .............. 29GB (BFLOAT16, 7 SHARDS)
ARCHITECTURE
markdown
OMNISENTER BASE 16B├── TEXT BACKBONE (DARWIN-MERGED QWEN3)│ ├── 36 TRANSFORMER LAYERS│ ├── SELF-ATTN + MLP + NORMS PER LAYER│ ├── EMBED_TOKENS (151,936 VOCAB)│ └── LM_HEAD├── CROSS-MODAL ATTENTION (FROM COSMOS3-NANO)│ ├── ADD_Q/K/V_PROJ + TO_ADD_OUT PER LAYER│ └── NORM_ADDED_Q/K PER LAYER├── MOE GENERATION TWINS (FROM COSMOS3-NANO)│ └── LAYERS.*.MLP_MOE_GEN.* + LAYERNORMS├── VISION ENCODER├── DIFFUSION TRANSFORMER (VIDEO/IMAGE GEN)├── SOUND TOKENIZER└── VAE
CAPABILITIES
markdown
TEXT REASONING ......... YES — ENHANCED VIA QWEN3-8B FUSIONVISION ................ YES — PRESERVED FROM COSMOS3-NANOAUDIO ................. YES — PRESERVED FROM COSMOS3-NANOVIDEO UNDERSTANDING ... YES — PRESERVED FROM COSMOS3-NANOVIDEO GENERATION ...... YES — PRESERVED FROM COSMOS3-NANOTOOL CALLING .......... BASE CAPABILITY — IMPROVEMENT VIA SFT (PLANNED)AGENTIC BEHAVIOR ...... BASE CAPABILITY — IMPROVEMENT VIA SFT (PLANNED)MUSIC GENERATION ...... NOT YET — ACESTEP INTEGRATION PLANNED (LINE 2)
USAGE
python
from transformers import AutoModelForCausalLM, AutoTokenizerimport torchmodel = AutoModelForCausalLM.from_pretrained("sovthpaw/OmniSenter-Base-16B",torch_dtype=torch.bfloat16,device_map="auto",)tokenizer = AutoTokenizer.from_pretrained("sovthpaw/OmniSenter-Base-16B")messages = [{"role": "user", "content": "Hello, what can you do?"}]text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = tokenizer([text], return_tensors="pt").to(model.device)outputs = model.generate(**inputs, max_new_tokens=512)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
TRAINING DATA (FOR FUTURE SFT)
markdown
HERMES REASONING TOOL USE ............. 5,000 CONVERSATIONSAURETH SFT CURRICULUM ................. 5,000 CONVERSATIONSHERMES AGENT TRACES ................... 3,679 CONVERSATIONSHERMES FUNCTION CALLING + THINKING .... 3,570 CONVERSATIONSHERMES FUNCTION CALLING V1 ............ 1,893 CONVERSATIONS─────────────────────────────────────────────────────────────TOTAL ................................ 34,142 CONVERSATIONSADDITIONAL: NEMOTRON, ATRPOPS, NOUS RESEARCH DATASETS
HARDWARE REQUIREMENTS
| FORMAT | VRAM | NOTES |
|---|---|---|
| BFLOAT16 (SAFETENSORS) | ~32GB | FULL PRECISION, A100/2×3090 |
| 4-BIT QUANTIZED (QLORA) | ~8GB | FOR FINE-TUNING |
| Q4_K_M GGUF | ~10GB | INFERENCE ON SINGLE 3090 |
LINEAGE
markdown
COSMOS3-NANO ──┐├── DARWIN MERGE ──► OMNISENTER BASE 16B (GEN-0)QWEN3-8B ─────┘ │├──► GEN-1 (EVOLVED, CMA-ES)├──► GEN-2 (EVOLVED + SFT)└──► ... CONTINUOUS EVOLUTION
CITATION
bibtex
@article{darwin2026family,title={Darwin Family: Training-Free Evolutionary Model Merging},author={Darwin Team},journal={arXiv preprint arXiv:2605.14386},year={2026}}
ACKNOWLEDGMENTS
- NVIDIA FOR COSMOS3-NANO
- QWEN TEAM FOR QWEN3-8B
- NOUS RESEARCH FOR HERMES AGENT TRAINING DATA AND INFRASTRUCTURE
- THE DARWIN FAMILY PAPER AUTHORS FOR THE EVOLUTIONARY MERGING METHODOLOGY
TOWARDS SELF-IMPROVEMENT
NOUS RESEARCH
Model provider
sovthpaw
Model tree
Base
nvidia/Cosmos3-Nano
Base
Qwen/Qwen3-8B
Merged
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information