prithivMLmods

prithivMLmods

Q3.5-9B-DS-v4-Flash-DA

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Key Highlights

  • DeepSeek V4 Distillation: Fine-tuned using curated reasoning traces distilled from DeepSeek V4 Flash for improved multi-step reasoning capabilities.
  • Distilled-Abliterated (DA): Applies advanced refusal direction analysis and ablation-based strategies to reduce internal refusal behaviors while preserving reasoning quality.
  • Qwen3.5 Backbone: Built on top of Qwen/Qwen3.5-9B through prithivMLmods/Qwen3.5-9B-Unredacted-MAX for strong reasoning and text generation performance.
  • Instruction + Reasoning Fusion: Handles both instruction-following and complex reasoning tasks seamlessly.
  • High-Coherence Outputs: Maintains consistency across long generations with improved contextual grounding.

Datasets Used and Training Details

Table
CategoryDetails
Base ModelQwen/Qwen3.5-9B
Intermediate ModelprithivMLmods/Qwen3.5-9B-Unredacted-MAX
Final Model Size9B Parameters
Training TypeMulti-stage distillation + abliteration
Training PipelineTRL (Transformer Reinforcement Learning)
ObjectivePreserve reasoning quality from larger models; reduce refusal behaviors via ablation strategies; improve instruction-following reliability
Reasoning DatasetJackrong/DeepSeek-V4-Distill-8000x (4000 random samples used)
Alignment / Evaluation DatasetprithivMLmods/harm_bench
Training FocusStructured reasoning, long-chain thinking, robustness across diverse prompts

Quick Start with Transformers

bash

pip install transformers==5.8.0
# or latest
pip install git+https://github.com/huggingface/transformers.git

python

from transformers import Qwen3_5ForConditionalGeneration, AutoProcessor
import torch
model = Qwen3_5ForConditionalGeneration.from_pretrained(
"prithivMLmods/Q3.5-9B-DS-v4-Flash-DA",
torch_dtype="auto",
device_map="auto"
)
processor = AutoProcessor.from_pretrained(
"prithivMLmods/Q3.5-9B-DS-v4-Flash-DA"
)
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Generate a highly detailed caption of a futuristic city skyline at sunset."
}
],
}
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = processor(
text=[text],
padding=True,
return_tensors="pt"
).to("cuda")
generated_ids = model.generate(
**inputs,
max_new_tokens=512
)
generated_ids_trimmed = [
out_ids[len(in_ids):]
for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(output_text)

Intended Use

  • Reasoning & Chain-of-Thought Tasks: Deep multi-step reasoning powered by DeepSeek V4 distilled traces
  • Instruction Following: Hybrid prompts requiring both instruction adherence and reasoning
  • Red-Teaming & Alignment Research: Evaluating reduced-refusal systems and refusal direction analysis
  • Local High-Performance Deployment: Multi-GPU or quantized inference setups
  • Research on Abliteration: Studying the effects of ablation-based training on reasoning preservation

Limitations & Risks

Important Note: This model intentionally minimizes built-in safety refusals.

  • Sensitive Content Risk: May produce unrestricted or controversial outputs
  • User Responsibility: Requires careful and ethical usage
  • High Compute Demand: Large models need significant VRAM or optimized inference
  • Abliteration Trade-offs: Reduced refusal may impact safety alignment and output filtering

Model provider

prithivMLmods

prithivMLmods

Model tree

Base

prithivMLmods/Qwen3.5-9B-Unredacted-MAX

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today