Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Usage

python

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"malvavisc0/qwen3.5-9b-opus-agent-gptq-int8",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("malvavisc0/qwen3.5-9b-opus-agent-gptq-int8")

Benchmarks

Same benchmarks as the original model:

ModelARCARC/EBoolQ
Qwen3.5-9B-Opus-Agent0.5890.7470.901

Notes

  • Quantized with GPTQ 8-bit using gptqmodel 7.1.0
  • Act-aware quantization enabled
  • Compatible with vLLM for efficient inference

Model provider

malvavisc0

Model tree

Base

armand0e/Qwen3.5-9B-Opus-Agent

Quantized

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today