barha

granite-4.1-3b-tool-selector

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Inputs and outputs

Input (user turn, JSON-encoded):

json
{
  "query": "What's the weather in Paris and the time in Tokyo?",
  "tools": [
    {"name": "get_weather", "description": "Get the current weather for a city."},
    {"name": "get_time",    "description": "Get the current local time for a city."},
    {"name": "send_email",  "description": "Send an email to a recipient."}
  ]
}

Output (assistant turn, JSON):

json
{"selected_tools": ["get_weather", "get_time"]}

The full prompt uses Granite's role-tagged tokens directly (the base tokenizer has no chat_template):

markdown
<|start_of_role|>system<|end_of_role|>You are a tool-selection assistant. Given a user query and a list of available tools, return the names of the tools that should be called.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>{...JSON above...}<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>

How to use

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("ibm-granite/granite-4.1-3b", torch_dtype="bfloat16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("barha/granite-4.1-3b-tool-selector")
model = PeftModel.from_pretrained(base, "barha/granite-4.1-3b-tool-selector")
model.eval()

SYSTEM = "You are a tool-selection assistant. Given a user query and a list of available tools, return the names of the tools that should be called."
user_payload = {
    "query": "What's the weather in Paris?",
    "tools": [{"name": "get_weather", "description": "Get the current weather for a city."}],
}
import json
prompt = (
    f"<|start_of_role|>system<|end_of_role|>{SYSTEM}<|end_of_text|>\n"
    f"<|start_of_role|>user<|end_of_role|>{json.dumps(user_payload)}<|end_of_text|>\n"
    f"<|start_of_role|>assistant<|end_of_role|>"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=128, do_sample=False)
gen = tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=False)
print(gen.split("<|end_of_text|>", 1)[0].strip())

Training data

Salesforce/xlam-function-calling-60k (gated dataset on Hugging Face), filtered to keep only rows whose answers list is non-empty (no-call abstentions are excluded). 90/10 train/val split, deterministic (no shuffle), seed 42.

For each row, the JSON-encoded tools list is reduced to {name, description} pairs (parameters dropped — the adapter routes by name only) and concatenated with the query in the user turn.

After filtering: 54,000 train / 6,000 validation examples.

Training procedure

Base model: ibm-granite/granite-4.1-3b (bf16, ungated, Apache 2.0)
Adapter: LoRA, r=8, alpha=16, dropout=0.05, bias="none"
Target modules: q_proj, k_proj, v_proj, o_proj
Trainable parameters: ~5.2 M (~0.17 % of base)
Epochs: 3
Optimizer: AdamW, lr=2e-4, cosine schedule, warmup ratio 0.03
16 (4 GPUs × per-device 4 × accum 1)

Loss curve

Table with columns: epoch, step, train loss, eval loss
epoch	step	train loss	eval loss
0.41	1400	0.55	0.554
0.95	3200	0.39	—
1.41	4760	0.38	—
2.03	6850	0.32	—
2.84

Eval loss dropped from 0.554 → 0.383 with no signs of overfitting at the end of epoch 3.

Evaluation

Greedy generation on the held-out 6,000-example val split (same 90/10 deterministic slice of xlam-function-calling-60k used during training; the adapter never saw these examples). Predictions parsed from the JSON selected_tools field, scored as sets of tool names.

Table with columns: metric, value
metric	value
exact set match	0.9930 (5,958 / 6,000)
macro F1	0.9960
precision	0.9963
recall	0.9960
parse failure rate	0.0007 (4 / 6,000)

Generation wall-time: ~15.4 min on 1× A100 (bs=8, max_new_tokens=128, greedy).

Reproduce with train/tool_selector/eval_adapter.py and the train/jobs/tool-selector-eval.yaml AppWrapper.

Caveat — in-distribution only. Train and val are both deterministic slices of the same dataset, so this is an upper bound under matched distribution. The numbers do not speak to OOD generalization (different tool taxonomies, ambiguous queries, adversarial tool lists). Run your own behavioral eval against your real tool set before relying on this adapter in production.

Model provider

barha

Model tree

Base

ibm-granite/granite-4.1-3b

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Inputs and outputs

Input (user turn, JSON-encoded):

json
{
  "query": "What's the weather in Paris and the time in Tokyo?",
  "tools": [
    {"name": "get_weather", "description": "Get the current weather for a city."},
    {"name": "get_time",    "description": "Get the current local time for a city."},
    {"name": "send_email",  "description": "Send an email to a recipient."}
  ]
}

Output (assistant turn, JSON):

json
{"selected_tools": ["get_weather", "get_time"]}

The full prompt uses Granite's role-tagged tokens directly (the base tokenizer has no chat_template):

markdown
<|start_of_role|>system<|end_of_role|>You are a tool-selection assistant. Given a user query and a list of available tools, return the names of the tools that should be called.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>{...JSON above...}<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>

How to use

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("ibm-granite/granite-4.1-3b", torch_dtype="bfloat16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("barha/granite-4.1-3b-tool-selector")
model = PeftModel.from_pretrained(base, "barha/granite-4.1-3b-tool-selector")
model.eval()

SYSTEM = "You are a tool-selection assistant. Given a user query and a list of available tools, return the names of the tools that should be called."
user_payload = {
    "query": "What's the weather in Paris?",
    "tools": [{"name": "get_weather", "description": "Get the current weather for a city."}],
}
import json
prompt = (
    f"<|start_of_role|>system<|end_of_role|>{SYSTEM}<|end_of_text|>\n"
    f"<|start_of_role|>user<|end_of_role|>{json.dumps(user_payload)}<|end_of_text|>\n"
    f"<|start_of_role|>assistant<|end_of_role|>"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=128, do_sample=False)
gen = tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=False)
print(gen.split("<|end_of_text|>", 1)[0].strip())

Training data

For each row, the JSON-encoded tools list is reduced to {name, description} pairs (parameters dropped — the adapter routes by name only) and concatenated with the query in the user turn.

After filtering: 54,000 train / 6,000 validation examples.

Training procedure

Base model: ibm-granite/granite-4.1-3b (bf16, ungated, Apache 2.0)
Adapter: LoRA, r=8, alpha=16, dropout=0.05, bias="none"
Target modules: q_proj, k_proj, v_proj, o_proj
Trainable parameters: ~5.2 M (~0.17 % of base)
Epochs: 3
Optimizer: AdamW, lr=2e-4, cosine schedule, warmup ratio 0.03
16 (4 GPUs × per-device 4 × accum 1)

Loss curve

Table with columns: epoch, step, train loss, eval loss
epoch	step	train loss	eval loss
0.41	1400	0.55	0.554
0.95	3200	0.39	—
1.41	4760	0.38	—
2.03	6850	0.32	—
2.84

Eval loss dropped from 0.554 → 0.383 with no signs of overfitting at the end of epoch 3.

Evaluation

Table with columns: metric, value
metric	value
exact set match	0.9930 (5,958 / 6,000)
macro F1	0.9960
precision	0.9963
recall	0.9960
parse failure rate	0.0007 (4 / 6,000)

Generation wall-time: ~15.4 min on 1× A100 (bs=8, max_new_tokens=128, greedy).

Reproduce with train/tool_selector/eval_adapter.py and the train/jobs/tool-selector-eval.yaml AppWrapper.

granite-4.1-3b-tool-selector

Get help setting up a custom Dedicated Endpoints.

README

Inputs and outputs

How to use

Training data

Training procedure

Loss curve

Evaluation

Explore FriendliAI today

README

Inputs and outputs

How to use

Training data

Training procedure

Loss curve

Evaluation