lilyzhng
qwen3.5-9b-tau2-sft-lora
Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
vLLM (no merge upload)
bash
vllm serve Qwen/Qwen3.5-9B --enable-lora --lora-modules sft=lilyzhng/qwen3.5-9b-tau2-sft-lora \--max-lora-rank 32 --dtype bfloat16
Model provider
lilyzhng
Model tree
Base
Qwen/Qwen3.5-9B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information