Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: mit

Key Features

  • 397B total / 17B active parameters (Mixture-of-Experts)
  • 1,010,000 token (1M) context window
  • SwiReasoning integration — dynamic explicit/latent reasoning switching for Pareto-superior accuracy and efficiency
  • General-purpose — strong agentic coding, reasoning, instruction-following, and multimodal performance
  • Post-trained from Qwen 3.5 397B
  • Multilingual — strong performance in Portuguese, English, Chinese, and dozens of other languages
  • MIT License — fully open for commercial and research use

Benchmark Results

Agentic Coding & Software Engineering

BenchmarkRio 3.5 Open 397BQwen 3.5 397B (base)Qwen 3.7 PlusDeepSeek V4 ProKimi-K2.6GPT 5.5
Terminal-Bench 2.170.852.570.367.966.778.2
DeepSWE23.06.08.024.070.0
SWE-Bench Pro58.150.957.659.059.558.6
SWE-Bench Verified80.276.277.780.680.282.9
SWE-Bench Multilingual77.069.375.876.276.7

Knowledge & Reasoning

BenchmarkRio 3.5 Open 397BQwen 3.5 397B (base)Qwen 3.7 PlusDeepSeek V4 ProKimi-K2.6GPT 5.5
GPQA Diamond90.988.490.390.190.593.6
HLE36.528.734.737.736.441.4
MMLU-Pro88.087.888.587.587.1
MMLU-Redux94.694.994.594.895.3
SuperGPQA72.370.471.469.971.3
Apex29.29.422.738.324.080.2

Mathematics

BenchmarkRio 3.5 Open 397BQwen 3.5 397B (base)Qwen 3.7 PlusDeepSeek V4 ProKimi-K2.6GPT 5.5
HMMT 2026 Feb93.987.992.995.292.798.5
IMOAnswerBench89.580.986.089.886.0

Multilingual

BenchmarkRio 3.5 Open 397BQwen 3.5 397B (base)Qwen 3.7 PlusDeepSeek V4 ProKimi-K2.6GPT 5.5
MMMLU89.888.589.087.987.5
MMLU-ProX85.684.785.483.983.7

Multimodal

BenchmarkRio 3.5 Open 397BQwen 3.5 397B (base)Qwen 3.7 PlusDeepSeek V4 ProKimi-K2.6GPT 5.5
MMMU-Pro78.479.079.079.481.2
MathVision89.188.690.387.4
VideoMMMU81.684.785.486.4

Agents & Instruction Following

BenchmarkRio 3.5 Open 397BQwen 3.5 397B (base)Qwen 3.7 PlusDeepSeek V4 ProKimi-K2.6GPT 5.5
MCP-Atlas74.274.273.273.666.675.3
IFBench78.476.579.177.076.076.0
IFEval93.492.694.691.994.5

Economic Value

BenchmarkRio 3.5 Open 397BQwen 3.5 397B (base)Qwen 3.7 PlusDeepSeek V4 ProKimi-K2.6GPT 5.5
GDPval (estimated)153312001520155414821769

Gains Over Base Model (Qwen 3.5 397B)

BenchmarkBase ModelRio 3.5 Open 397BΔ
Terminal-Bench 2.152.570.8+18.3
DeepSWE6.023.0+17.0
SWE-Bench Pro50.958.1+7.2
SWE-Bench Verified76.280.2+4.0
SWE-Bench Multilingual69.377.0+7.7
GPQA Diamond88.490.9+2.5
HLE28.736.5+7.8
HMMT 2026 Feb87.993.9+6.0
IMOAnswerBench80.989.5+8.6
Apex9.429.2+19.8
GDPval (estimated)12001533+333

SwiReasoning: Latent/Explicit Reasoning

Rio 3.5 Open 397B integrates SwiReasoning (Shi et al., 2025), a training-free inference framework that dynamically alternates between two reasoning modes:

  • Explicit reasoning — standard chain-of-thought in natural language, where the model commits tokens to a single reasoning path
  • Latent reasoning — continuous reasoning in hidden space, where the model explores multiple implicit paths simultaneously without emitting tokens

The switching is governed by block-wise confidence estimated from entropy trends in the next-token distribution. When confidence is low (entropy trending upward), the model enters latent mode to explore alternatives. When confidence recovers, it switches back to explicit mode to commit to a solution.

This approach achieves a Pareto-superior trade-off: higher accuracy at unlimited budgets and dramatically better token efficiency under constrained budgets. As with previous Rio generations, the model was post-trained to maximize the gains obtained from latent reasoning.

How to Use

python

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "prefeitura-rio/Rio-3.5-Open-397B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
prompt = "Write a poem about Rio de Janeiro."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=81920,
temperature=0.6,
top_p=0.95,
)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

Using with vLLM

bash

vllm serve prefeitura-rio/Rio-3.5-Open-397B \
--tensor-parallel-size 8 \
--max-model-len 1048576 \
--trust-remote-code

Using with SGLang

bash

python -m sglang.launch_server \
--model-path prefeitura-rio/Rio-3.5-Open-397B \
--tp 8 \
--context-length 1048576 \
--trust-remote-code

Model Details

DeveloperIplanRIO — Empresa Municipal de Informática e Planejamento S.A.
Base ModelQwen 3.5 397B
ArchitectureMixture-of-Experts (MoE) Transformer
Total Parameters~397B
Active Parameters~17B
Context Length1,010,000 tokens (1M)
Training MethodPost-training
Inference EnhancementSwiReasoning (latent/explicit switching)
LicenseMIT
LanguagesMultilingual (en, pt, zh, ja, ko, fr, de, es, ar, and more)

Citation

If you use SwiReasoning, please also cite:

bibtex

@misc{shi2025swireasoning,
title={SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs},
author={Dachuan Shi et al.},
year={2025},
eprint={2510.05069},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

Acknowledgments

Rio 3.5 Open 397B is built upon the exceptional work of the Qwen Team and their Qwen 3.5 model family. We also acknowledge the authors of SwiReasoning for their innovative inference framework.

Developed in Rio de Janeiro 🇧🇷 by IplanRIO.

Model provider

prefeitura-rio

Model tree

Base

Qwen/Qwen3.5-397B-A17B

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today