prithivMLmods
Qwen3-VL-4B-Instruct-abliterated-v1
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Base Model Signatures:
This model has been re-sharded and optimized for the latest Transformers version from the base model: https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-4B-Instruct-abliterated.
Quick Start with Transformers
python
from transformers import Qwen3VLForConditionalGeneration, AutoProcessorfrom qwen_vl_utils import process_vision_infoimport torchmodel = Qwen3VLForConditionalGeneration.from_pretrained("prithivMLmods/Qwen3-VL-4B-Instruct-abliterated-v1", torch_dtype="auto", device_map="auto")processor = AutoProcessor.from_pretrained("prithivMLmods/Qwen3-VL-4B-Instruct-abliterated-v1")messages = [{"role": "user","content": [{"type": "image","image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",},{"type": "text", "text": "Provide a detailed caption and reasoning for this image."},],}]text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)image_inputs, video_inputs = process_vision_info(messages)inputs = processor(text=[text],images=image_inputs,videos=video_inputs,padding=True,return_tensors="pt",)inputs = inputs.to("cuda")generated_ids = model.generate(**inputs, max_new_tokens=128)generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)print(output_text)
Intended Use
This model is suited for:
- Generating detailed, uncensored captions and reasoning for general-purpose or artistic datasets.
- Research in content moderation, red-teaming, and generative safety evaluation.
- Enabling descriptive captioning and reasoning for visual datasets typically excluded from mainstream models.
- Creative applications such as storytelling, art generation, or multimodal reasoning tasks.
- Captioning and reasoning for non-standard aspect ratios and stylized visual content.
Limitations
- May produce explicit, sensitive, or offensive descriptions depending on image content and prompts.
- Not recommended for production systems requiring strict content moderation.
- Output style, tone, and reasoning can vary depending on input phrasing.
- Accuracy may vary for unfamiliar, synthetic, or highly abstract visual content.
Model provider
prithivMLmods
Model tree
Base
Qwen/Qwen3-VL-4B-Instruct
Fine-tuned
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information