Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Qwen3.5-2B-ReMix-Final

Overview

Qwen3.5-2B-ReMix-Final is a precision-engineered, native Float16 (F16) fine-tune of Qwen/Qwen3.5-2B. While the previous ReMix iterations focused on the broad integration of large-scale distillation datasets, the Final version is the result of a specialized Supervised Fine-Tuning (SFT) strategy designed to maximize stability and logical coherence.

This model is specifically tuned to eliminate the "reasoning loops" common in small models. By shifting the training focus to strict instruction-following and adversarial logic handling, ReMix-Final acts as a robust, logic-first assistant for local execution.


🚀 Key Improvements & Comparison

This model marks a significant departure from the base and the initial ReMix:

  • Superior Logic Handling: While the base model is prone to repetitive cycles under stress, ReMix-Final demonstrates a vastly improved ability to traverse complex constraints and converge on an answer.
  • Instruction Following: Leveraging SFT datasets, the model adheres more strictly to formatting requirements and complex multi-step instructions.
  • Impossible-Question Awareness: Trained on custom hard-distillation sets, the model has been taught to recognize logical contradictions, allowing it to "break" a loop by identifying a problem as unsolvable.

🌟 Model Details

  • Base Model: Qwen/Qwen3.5-2B (Pre-integrated with multi-source distillation)

  • SFT Foundations: * allenai/Dolci-Instruct-SFT

  • nvidia/Nemotron-SFT-Instruction-Following-Chat-v2

  • Reasoning Enhancement: * Jackrong/DeepSeek-V4-Distill-8000x

  • 5x Custom high-difficulty distillation sets targeting logical "dead-ends."

  • Format: Native F16 Merged Weights.

  • License: Apache-2.0


🎛️ Recommended Generation Parameters

ParameterValuePurpose
Temperature0.4 - 1.0Essential for keeping reasoning deterministic and focused.
Repetition Penalty1.15 - 1.2Acts as a safety net to help the model break out of residual loops.
Top K / Top P30 / 0.9Provides the model with enough vocabulary depth for technical tasks.
enable_thinkingTrueRecommended to leverage the internal reasoning architecture.
context_length & max_token> 4096Allow the model to freely reason through. This model usually take more than 4000 - 5000 tokens to reason.

⚠️ Limitations & Fallback Behavior

  • Residual Looping: Despite the additional SFT training aimed at stability, 2B models can still fall back into looping patterns when faced with extreme ambiguity or recursive paradoxes. This is a significant improvement over the base and earlier ReMix, but remains a known characteristic of compact architectures.
  • Specialized Logic: The model is optimized for procedural reasoning (math, logic, code). For creative writing or general conversation, the "minimized reasoning" training may result in shorter, more direct responses than expected.
  • Verification: Always verify mathematical and technical outputs.

📦 Usage (Transformers)

python

from openai import OpenAI
# Configured by environment variables
client = OpenAI()
messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3.5/demo/RealWorld/RealWorld-04.png"
}
},
{
"type": "text",
"text": "Where is this?"
}
]
}
]
chat_response = client.chat.completions.create(
model="ertghiu256/Qwen3.5-2b-ReMix-final",
messages=messages,
max_tokens=32768,
temperature=0.7,
top_p=0.9,
repeat_penalty=1.2,
extra_body={
"top_k": 30,
},
)
print("Chat response:", chat_response)

Uploaded finetuned model

  • Developed by: ertghiu256
  • License: apache-2.0
  • Finetuned from model : ertghiu256/Qwen3.5-2b-ReMix-Vision-GRPO

This qwen3_5 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Model provider

ertghiu256

ertghiu256

Model tree

Base

ertghiu256/Qwen3.5-2b-ReMix-Vision-GRPO

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today