junwatu

ono-gemma-4-12b-fable5-agent

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: other

Training

Table
ItemValue
Datasettool_use rows only (~3,600), CoT capped at 1,200 chars
Train / val split95% / 5% (seed=42)
Epochs3
Learning rate1e-5 (cosine, 3% warmup)
Effective batch size16 (batch 1 × grad accum 16)
Max sequence length3,072 tokens
Loss maskingUser + CoT masked → train only on call JSON
OptimizerAdamW 8-bit
GPUNVIDIA H200 on Modal
Train loss0.937
Eval loss0.400
Training time~3h 48m

Vision and audio towers are present in the unified Gemma 4 checkpoint but were frozen during text-only training.

Evaluation

Batch evaluation on 50 held-out Fable-5 samples (seed=42, max_new_tokens=1024, temperature=0.2):

Table
MetricResult
Tool name accuracy56%
call block emitted96%
Parseable tool JSON94%

These numbers are indicative only and do not meet production reliability thresholds.

Recommended inference settings:

Table
ParameterValue
max_new_tokens1024
temperature0.2
do_sampletrue (or greedy for max consistency)

Prompt format

Each turn follows Gemma chat tokens with an explicit thought → call structure:

markdown

<start_of_turn>user
{agent context: tool defs, history, task}<end_of_turn>
<start_of_turn>model
thought
{chain-of-thought reasoning}
call
{'tool': 'Edit', 'input': {'file_path': '...', 'old_string': '...', 'new_string': '...'}}<end_of_turn>

At inference, start the model turn and let it generate from thought:

python

prompt = (
f"<start_of_turn>user\n{context}<end_of_turn>\n"
f"<start_of_turn>model\nthought\n"
)

Quick start

python

import torch
from transformers import AutoModelForMultimodalLM, AutoTokenizer
model_id = "junwatu/ono-gemma-4-12b-fable5-agent"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMultimodalLM.from_pretrained(
model_id,
dtype=torch.bfloat16,
device_map="auto",
)
model.eval()
context = "You are a coding agent. List all Python files in the current directory."
prompt = (
f"<start_of_turn>user\n{context}<end_of_turn>\n"
f"<start_of_turn>model\nthought\n"
)
inputs = tokenizer(prompt, return_tensors="pt")
inputs["token_type_ids"] = torch.zeros_like(inputs["input_ids"])
inputs["mm_token_type_ids"] = torch.zeros_like(inputs["input_ids"])
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=1024,
temperature=0.2,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
response = tokenizer.decode(
output_ids[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=False,
)
print(response)

Important: Gemma 4 unified models require token_type_ids and mm_token_type_ids (all zeros for text-only) even when not using vision or audio.

Supported tools (from training data)

Common tool names seen in Fable-5 traces include Bash, Edit, Read, Write, Grep, WebSearch, TaskUpdate, PowerShell, and MCP-prefixed tools. Accuracy varies by tool type.

Limitations

  • Not for production — experimental checkpoint with ~56% tool accuracy on a small eval set; unsuitable for live agent deployment without further work.
  • Long contexts are truncated to 3,072 tokens during training.
  • Sampling matters — low temperature (0.2) and sufficient max_new_tokens (1024) are important for reliable call block generation.
  • Multimodal weights are included but unused; only text LM weights were fine-tuned.
  • Trained on a single agent trace style (Fable-5); may not generalize to other tool schemas without further fine-tuning.

License

Built on google/gemma-4-12B-it. Use is subject to the Gemma license terms. Fable-5 dataset: Glint-Research/Fable-5-traces.

Model provider

junwatu

Model tree

Base

google/gemma-4-12B-it

Fine-tuned

this model

Modalities

Input

Video, Audio, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today