Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Compression for the Model
Qwen3.5-9B-abliterated-v2-MAX
| Format | Description | Link |
|---|---|---|
| GGUF | Quantized GGUF format | https://huggingface.co/prithivMLmods/Qwen3.5-9B-abliterated-v2-MAX/tree/main/GGUF |
| NVFP4 | NVFP4 compressed model | https://huggingface.co/prithivMLmods/Qwen3.5-9B-abliterated-v2-MAX-NVFP4 |
| FP8 | FP8 compressed model | https://huggingface.co/prithivMLmods/Qwen3.5-9B-abliterated-v2-MAX-FP8 |
Base Model Signatures:
This model has been re-sharded and optimized for the latest Transformers version from the base model: https://huggingface.co/huihui-ai/Huihui-Qwen3.5-9B-abliterated
Key Highlights
-
Optimized Packaging & Sharding Improved repository structure for smoother downloads, loading, and deployment across environments.
-
Stable Transformers Compatibility Updated layout for better compatibility with modern Transformers versions and inference pipelines.
-
9B Parameter Architecture Built on Qwen3.5-9B, balancing efficiency and capability for local and research use.
-
Efficient Deployment Design Designed for lightweight inference, experimentation, and scalable integration.
-
Preserved Model Behavior No changes to weights or core architecture; performance remains consistent with the original base model lineage.
-
Improved Reliability in Loading Reduced friction in model initialization and multi-device inference setups.
Quick Start with Transformers
bash
pip install transformers==5.4.0# orpip install git+https://github.com/huggingface/transformers.git
python
from transformers import Qwen3_5ForConditionalGeneration, AutoProcessorimport torchmodel = Qwen3_5ForConditionalGeneration.from_pretrained("prithivMLmods/Qwen3.5-9B-abliterated-v2-MAX",torch_dtype="auto",device_map="auto")processor = AutoProcessor.from_pretrained("prithivMLmods/Qwen3.5-9B-abliterated-v2-MAX")messages = [{"role": "user","content": [{"type": "text", "text": "Explain how transformer models work in simple terms."}],}]text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)inputs = processor(text=[text],padding=True,return_tensors="pt").to("cuda")generated_ids = model.generate(**inputs, max_new_tokens=256)generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]output_text = processor.batch_decode(generated_ids_trimmed,skip_special_tokens=True,clean_up_tokenization_spaces=False)print(output_text)
Intended Use
-
Multimodal and Language Research Studying behavior of compact 9B-scale transformer models under different inference settings.
-
Red-Teaming & Evaluation Testing robustness across adversarial prompts and edge-case inputs.
-
Efficient Local Deployment Running lightweight yet capable models on consumer GPUs or optimized cloud setups.
-
Research Prototyping Exploring model behavior, alignment, and inference optimization techniques.
Limitations & Risks
Important Note: This model inherits behavior from its base model with minimal modification.
-
Output Variability Responses may vary depending on sampling strategy and prompt formulation.
-
Resource Dependency While efficient, GPU acceleration is recommended for optimal performance.
-
No Architectural Changes Improvements are limited to packaging and compatibility, not core model capabilities.
-
General Model Limitations May still produce incorrect, incomplete, or inconsistent outputs in complex scenarios.
Model provider
prithivMLmods
Model tree
Base
Qwen/Qwen3.5-9B
Quantized
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information