Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Usage
python
from transformers import AutoModelForCausalLM, AutoTokenizermodel = AutoModelForCausalLM.from_pretrained("malvavisc0/qwen3.5-9b-opus-agent-gptq-int8",device_map="auto")tokenizer = AutoTokenizer.from_pretrained("malvavisc0/qwen3.5-9b-opus-agent-gptq-int8")
Benchmarks
Same benchmarks as the original model:
| Model | ARC | ARC/E | BoolQ |
|---|---|---|---|
| Qwen3.5-9B-Opus-Agent | 0.589 | 0.747 | 0.901 |
Notes
- Quantized with GPTQ 8-bit using gptqmodel 7.1.0
- Act-aware quantization enabled
- Compatible with vLLM for efficient inference
Model provider
malvavisc0
Model tree
Base
armand0e/Qwen3.5-9B-Opus-Agent
Quantized
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information