Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model details

  • Base (inference): Qwen/Qwen2.5-1.5B-Instruct — RelayOps loads the adapter over this full-precision base.
  • Trained on: unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit (Unsloth 4-bit QLoRA); the adapter loads on either base.
  • Method: Unsloth + LoRA/QLoRA (adapter only)
  • Task: single-label intent classification, output {"intent": "<label>"}
  • Labels: reset_device, device_status, device_faq, billing, greeting, unknown
  • Dataset: 2,400 examples, 400 per intent, curated seeds + deterministic template paraphrases, with group ids so paraphrase families don't leak across splits.

Intended use & scope

  • Input: one customer chat message. Output: one intent label as JSON.
  • The model classifies intent only. It does not decide risk, route, permissions, billing, offers, or account access — those are enforced by RelayOps' deterministic access gate and router (policy stays out of model weights).
  • Confidence is read from the model's own token probabilities at inference, not baked into labels.

Out-of-scope use

Do not use this model to make billing, payment, plan-change, access-control, offer, or customer-eligibility decisions. It predicts intent only; those decisions belong to RelayOps' deterministic access gate, router, and human escalation. It is not a general-purpose intent model — it is trained on six telecom intents over synthetic data.

Evaluation

SplitAccuracyMacro-F1
Held-out (seed-13, group-aware, 726 ex)0.9990.999
Hand-written adversarial / paraphrase (24 ex)0.9580.804

Baselines on the same sets: keyword 0.506 / 0.250 acc; Complement NB 0.933 / 0.667 acc.

Honest caveat. The held-out set is template-generated synthetic data, so high in-distribution scores are expected even with anti-leakage splits. Treat the held-out number as routing-slice validation, not a production benchmark; the adversarial set is the truer generalization signal, and the adversarial macro-F1 (0.804 < 0.958 accuracy) shows the model is still uneven on the hardest classes.

Limitations

  • Trained on synthetic telecom data for six intents; not a general intent model.
  • Out-of-taxonomy / mixed-intent / abusive messages map to unknown, which RelayOps escalates — the model does not resolve them.
  • Adversarial set is small (24); per-class adversarial recall and a larger set are follow-ups.

How to use (in RelayOps)

bash

RELAYOPS_INTENT_MODEL=<this-repo-or-local-adapter-dir> \
python -m src.eval.run_intent_eval

or in code:

python

from src.router.registry import get_classifier
clf = get_classifier("finetuned") # reads RELAYOPS_INTENT_MODEL
clf.classify("my internet is down") # -> Classification(intent=reset_device, ...)

Reproduce

Training recipe: src/router/finetune_train.py (Unsloth LoRA). Data export: src/eval/export_finetune_data.py. Colab notebook: notebooks/finetune_intent_colab.ipynb.

Model provider

venkatamanideep

Model tree

Base

Qwen/Qwen2.5-1.5B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today