prithivMLmods
Q3.6-35B-A3B-abliterated-0520-MAX-STOR-check
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Evaluation [Self Reported]
| Metric | Result |
|---|---|
| Refusal Rate | N/A |
| Test Setup | N/A |
| Inference Type | text-generation |
| Dataset | N/A |
Note: This release does not introduce new benchmark evaluations and primarily focuses on repackaging, sharding updates, and Transformers compatibility improvements over the base model.
Key Highlights
-
Latest Transformers Compatibility Re-sharded and optimized for improved compatibility with recent Transformers releases.
-
Optimized Model Sharding Updated shard structure for improved download reliability, storage handling, and inference efficiency.
-
Streamlined Inference Packaging Repository structure optimized for easier integration into large-scale inference workflows.
-
35B MoE Architecture (A3B) Built on Qwen/Qwen3.6-35B-A3B, leveraging Mixture-of-Experts design for scalable reasoning capacity.
-
Improved Deployment Stability Designed for more consistent loading and inference behavior across environments.
-
Preserved Model Behavior No modifications to weights or architecture; all behavior remains aligned with the original model lineage.
Base Model Signatures:
This model has been re-sharded and optimized for the latest Transformers version from the base model: https://huggingface.co/huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated
Quick Start with Transformers
bash
pip install transformers==5.8.0# orpip install git+https://github.com/huggingface/transformers.git
python
from transformers import Qwen3_5MoeForConditionalGeneration, AutoProcessorimport torchmodel = Qwen3_5MoeForConditionalGeneration.from_pretrained("prithivMLmods/Q3.6-35B-A3B-abliterated-0520-MAX-STOR-check",torch_dtype="auto",device_map="auto")processor = AutoProcessor.from_pretrained("prithivMLmods/Q3.6-35B-A3B-abliterated-0520-MAX-STOR-check")messages = [{"role": "user","content": [{"type": "text", "text": "Explain how transformer models work in simple terms."}],}]text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = processor(text=[text],padding=True,return_tensors="pt").to("cuda")generated_ids = model.generate(**inputs, max_new_tokens=256)generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]output_text = processor.batch_decode(generated_ids_trimmed,skip_special_tokens=True,clean_up_tokenization_spaces=False)print(output_text)
Intended Use
-
Multimodal & Language Research Studying large-scale MoE behavior and inference characteristics.
-
Red-Teaming & Evaluation Testing robustness across complex and adversarial prompts.
-
High-Performance Local Deployment Running large Mixture-of-Experts models on multi-GPU setups.
-
Research Prototyping Experimentation with scalable transformer architectures.
Limitations & Risks
Important Note: This model inherits the behavior and limitations of its base model.
-
Output Variability Responses may vary depending on sampling settings and prompt structure.
-
High Compute Requirements A 35B MoE model requires significant GPU memory and optimized inference strategies such as quantization or tensor parallelism.
-
Deployment Constraints Performance depends heavily on hardware configuration and runtime optimization.
-
General Model Limitations May produce incorrect or incomplete outputs in complex scenarios.
Model provider
prithivMLmods
Model tree
Base
prithivMLmods/Qwen3.6-35B-A3B-Uncensored-Aggressive
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information