Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0⚠️ Moved — this repo is deprecated
The kubelm-qwen3.5-2b-v1 LoRA adapter has been consolidated into the
single model repo alongside the Q4_K_M GGUF and model card:
👉 rbentaarit/kubelm-qwen3.5-2b-v1
The adapter now lives under adapter/ in that repo. This standalone
-lora repo is kept only so existing links keep resolving; it receives
no further updates. Pull the adapter from the consolidated repo:
python
from huggingface_hub import snapshot_downloadsnapshot_download("rbentaarit/kubelm-qwen3.5-2b-v1", allow_patterns="adapter/*")
For everything else (GGUF, serving instructions, eval numbers, provenance) see the consolidated repo's card.
Model provider
rbentaarit
Model tree
Base
Qwen/Qwen3.5-2B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information