Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

⚠️ Moved — this repo is deprecated

The kubelm-qwen3.5-2b-v1 LoRA adapter has been consolidated into the single model repo alongside the Q4_K_M GGUF and model card:

👉 rbentaarit/kubelm-qwen3.5-2b-v1

The adapter now lives under adapter/ in that repo. This standalone -lora repo is kept only so existing links keep resolving; it receives no further updates. Pull the adapter from the consolidated repo:

python

from huggingface_hub import snapshot_download
snapshot_download("rbentaarit/kubelm-qwen3.5-2b-v1", allow_patterns="adapter/*")

For everything else (GGUF, serving instructions, eval numbers, provenance) see the consolidated repo's card.

Model provider

rbentaarit

Model tree

Base

Qwen/Qwen3.5-2B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today