prithivMLmods

prithivMLmods

Q3.6-35B-A3B-abliterated-0520-MAX-STOR-check

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Evaluation [Self Reported]

Table
MetricResult
Refusal RateN/A
Test SetupN/A
Inference Typetext-generation
DatasetN/A

Note: This release does not introduce new benchmark evaluations and primarily focuses on repackaging, sharding updates, and Transformers compatibility improvements over the base model.


Key Highlights

  • Latest Transformers Compatibility Re-sharded and optimized for improved compatibility with recent Transformers releases.

  • Optimized Model Sharding Updated shard structure for improved download reliability, storage handling, and inference efficiency.

  • Streamlined Inference Packaging Repository structure optimized for easier integration into large-scale inference workflows.

  • 35B MoE Architecture (A3B) Built on Qwen/Qwen3.6-35B-A3B, leveraging Mixture-of-Experts design for scalable reasoning capacity.

  • Improved Deployment Stability Designed for more consistent loading and inference behavior across environments.

  • Preserved Model Behavior No modifications to weights or architecture; all behavior remains aligned with the original model lineage.


Base Model Signatures:

This model has been re-sharded and optimized for the latest Transformers version from the base model: https://huggingface.co/huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated


Quick Start with Transformers

bash

pip install transformers==5.8.0
# or
pip install git+https://github.com/huggingface/transformers.git

python

from transformers import Qwen3_5MoeForConditionalGeneration, AutoProcessor
import torch
model = Qwen3_5MoeForConditionalGeneration.from_pretrained(
"prithivMLmods/Q3.6-35B-A3B-abliterated-0520-MAX-STOR-check",
torch_dtype="auto",
device_map="auto"
)
processor = AutoProcessor.from_pretrained(
"prithivMLmods/Q3.6-35B-A3B-abliterated-0520-MAX-STOR-check"
)
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Explain how transformer models work in simple terms."}
],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(
text=[text],
padding=True,
return_tensors="pt"
).to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=256)
generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(output_text)

Intended Use

  • Multimodal & Language Research Studying large-scale MoE behavior and inference characteristics.

  • Red-Teaming & Evaluation Testing robustness across complex and adversarial prompts.

  • High-Performance Local Deployment Running large Mixture-of-Experts models on multi-GPU setups.

  • Research Prototyping Experimentation with scalable transformer architectures.


Limitations & Risks

Important Note: This model inherits the behavior and limitations of its base model.

  • Output Variability Responses may vary depending on sampling settings and prompt structure.

  • High Compute Requirements A 35B MoE model requires significant GPU memory and optimized inference strategies such as quantization or tensor parallelism.

  • Deployment Constraints Performance depends heavily on hardware configuration and runtime optimization.

  • General Model Limitations May produce incorrect or incomplete outputs in complex scenarios.

Model provider

prithivMLmods

prithivMLmods

Model tree

Base

prithivMLmods/Qwen3.6-35B-A3B-Uncensored-Aggressive

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today