Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Use with mlx
bash
pip install mlx-lm
python
from mlx_lm import load, generatemodel, tokenizer = load("ggolani/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-mlx-4Bit")prompt="hello"if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:messages = [{"role": "user", "content": prompt}]prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)response = generate(model, tokenizer, prompt=prompt, verbose=True)
Model provider
ggolani
Model tree
Base
DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking
Quantized
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information