Ayush0110

toolforge-qwen7b-r64

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

What it does

Routes a query to one (or several) of these tools, or to a direct answer:

web_search, calculator, weather, wikipedia, datetime, dictionary, translate, unit_converter, web_reader — plus no_tool (answer directly) and multi_tool (chained calls).

Output format:

markdown

<tool_calls>[{"name": "weather", "arguments": {"location": "Tokyo"}}]</tool_calls>

Evaluation (honest, non-circular)

Measured on a hand-written, non-circular test set (36 realistic, indirectly phrased queries, hand-labeled — no teacher model involved), comparing the base model against this adapter on identical inputs. Grading is format-agnostic: a prediction counts if the correct tool is identified in any recognizable format, so the base model isn't penalized for not using the trained format.

Table
ModelRouting accuracyStrict-format accuracy
Base Qwen2.5-7B-Instruct75.0%75.0%
ToolForge (this adapter)83.3%83.3%
Gain from fine-tuning+8.3 pp+8.3 pp

Key point: strict and lenient scores are identical for both models — base Qwen already emits parseable tool-call formats, so the improvement comes from better routing decisions, not output formatting. Gains concentrate on disambiguating web_search vs wikipedia, unit_converter vs calculator, and multi-tool selection.

A separate ablation on a held-out split of the (teacher-labeled) synthetic data reports ~86%, but that number is partly circular and is best read as an internal hyperparameter comparison. The table above is the unbiased estimate.


Limitations

  • Fixed tool set. This is a specialist router for the 9 tools above. It does not generalize to arbitrary, prompt-supplied function schemas the way a general function-calling model does. Adding a tool requires retraining. The tradeoff is intentional: a small, cheap, self-hostable router for a known tool set, instead of a large general model on every call.
  • Over-triggering on chit-chat. Fine-tuning slightly increases the tendency to call a tool on no-tool conversational queries (e.g. "what is 2 plus 2") — a precision/recall tradeoff.
  • Trained on synthetic data (template-generated + Gemini-distilled), English only.

How to use

python

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
base_id = "Qwen/Qwen2.5-7B-Instruct"
tok = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(base_id, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(base, "Ayush0110/toolforge-qwen7b-r64")
model.eval()
SYS = ("You are a tool-routing assistant. Given a user query, decide which tool(s) "
"to call and with what arguments. If no tool is needed, respond directly. "
"You have access to: web_search, calculator, weather, wikipedia, datetime, "
"dictionary, translate, unit_converter, web_reader. "
'Output tool calls as: <tool_calls>[{"name": "tool", "arguments": {...}}]</tool_calls>')
msgs = [{"role": "system", "content": SYS},
{"role": "user", "content": "is it jacket weather in Copenhagen right now"}]
text = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=128, do_sample=False)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
# -> <tool_calls>[{"name": "weather", "arguments": {"location": "Copenhagen"}}]</tool_calls>

Training details

Table
BaseQwen/Qwen2.5-7B-Instruct
Quantization4-bit NF4 + double quant
LoRAr=64, α=128, dropout=0.05, targets: q,k,v,o,gate,up,down
Optimizer / LRAdamW, 2e-4 cosine, 10% warmup
Batch4 × 4 grad-accum = 16 effective
Epochs3 (best at eval_loss ≈ 0.14)
Data1,173 examples (template-generated + Gemini-2.5-flash distilled)
Hardwaresingle T4 (16GB), ~2.4 h
TrackingWeights & Biases

License

Apache-2.0 (inherits from the Qwen2.5 base model).

Model provider

Ayush0110

Model tree

Base

Qwen/Qwen2.5-7B-Instruct

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today