Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Evaluation [Self Reported]
| Metric | Result |
|---|---|
| Refusal Rate (harm_bench) | 0 / 250 |
| Test Setup | 250 random harmful prompts |
| Inference Pipeline | Transformers |
| Inference Type | text-generation |
| Dataset | harm_bench |
Note: This model was tested on 250 randomly sampled harmful prompts based on the harm_bench dataset. The result shows 0 refusals out of 250. For more details, refer to the dataset page linked above.
Key Highlights
- Heretic Stable Training: Refined to reduce internal refusal behaviors while improving response stability and coherent long-form multimodal generation.
- 8B Multimodal Architecture: Based on Qwen3-VL-8B-Instruct, delivering strong vision-language understanding and detailed reasoning capabilities.
- Enhanced Visual Reasoning: Optimized for deep analysis of artistic, technical, forensic, abstract, and research-oriented visual content.
- High-Fidelity Captioning: Generates rich and descriptive captions suitable for metadata generation, accessibility pipelines, and dataset enrichment.
- Dynamic Resolution Handling: Maintains native Qwen3-VL support for multiple aspect ratios and high-resolution image processing.
- Stable Instruction Following: Tuned to preserve conversational coherence and reduce generation instability during extended reasoning tasks.
Quick Start with Transformers
bash
pip install transformers==5.9.0# orpip install git+https://github.com/huggingface/transformers.git
python
from transformers import Qwen3VLForConditionalGeneration, AutoProcessorfrom qwen_vl_utils import process_vision_infoimport torch# Load the Heretic Stable modelmodel = Qwen3VLForConditionalGeneration.from_pretrained("prithivMLmods/Qwen3-VL-8B-Heretic-Stable",torch_dtype="auto",device_map="auto")processor = AutoProcessor.from_pretrained("prithivMLmods/Qwen3-VL-8B-Heretic-Stable")messages = [{"role": "user","content": [{"type": "image","image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",},{"type": "text","text": "Provide a detailed caption and reasoning for this image."},],}]text = processor.apply_chat_template(messages,tokenize=False,add_generation_prompt=True)image_inputs, video_inputs = process_vision_info(messages)inputs = processor(text=[text],images=image_inputs,videos=video_inputs,padding=True,return_tensors="pt",).to("cuda")generated_ids = model.generate(**inputs,max_new_tokens=256)generated_ids_trimmed = [out_ids[len(in_ids):]for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]output_text = processor.batch_decode(generated_ids_trimmed,skip_special_tokens=True,clean_up_tokenization_spaces=False)print(output_text)
Intended Use
- Advanced Multimodal Research: Exploring reasoning behavior and multimodal robustness across diverse prompts.
- Visual Dataset Enrichment: Producing detailed captions for historical, artistic, scientific, or technical datasets.
- Behavioral Alignment Research: Studying the effects of refusal-reduction and abliteration-based fine-tuning strategies.
- Creative Vision-Language Applications: Supporting storytelling, world-building, visual narration, and scene interpretation workflows.
Limitations & Risks
Important Notice: This model intentionally minimizes conventional refusal mechanisms.
- Sensitive Output Generation: The model may produce explicit, controversial, or unrestricted outputs depending on prompts.
- User Responsibility: Outputs should be used responsibly and in accordance with applicable legal and ethical standards.
- Large Hardware Requirements: High-resolution multimodal inference may require substantial GPU memory and compute resources.
Model Lineage
- Base Model: Qwen/Qwen3-VL-8B-Instruct
- Intermediate Variant: prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX
- Current Release: prithivMLmods/Qwen3-VL-8B-Heretic-Stable
Acknowledgements
I would like to thank the works of the following:
- Maxime Labonne — Uncensor any LLM with abliteration
- NVIDIA Transformer Engine Docs — Using FP8 and FP4 with Transformer Engine
- Remove Refusals with Transformers by Sumandora
- LLM Compressor by vLLM Project
- NVIDIA FP8 Introduction
Model provider
prithivMLmods
Model tree
Base
prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX
Fine-tuned
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information