Intended Use
This model is intended for research and prototyping around LLM-driven humanoid robot assistants.
It is designed to convert user commands into structured JSON responses such as:
{
"type": "tool_call",
"response": "Searching for the cup.",
"tool": "search_object",
"arguments": {
"object": "cup"
}
}
The expected use case is a controlled robot-agent pipeline:
User command
→ LLM JSON response
→ Planner / Controller validation
→ Tool execution
→ Robot or environment state update
The model is fine-tuned to return a valid JSON object with exactly these fields:
{
"type": "tool_call",
"response": "short natural language explanation",
"tool": "tool_name",
"arguments": {}
}
Valid type values are:
chat
tool_call
clarify
refuse
For chat, clarify, and refuse, the tool field should be null.
When no arguments are required, arguments should be an empty object:
{
"type": "tool_call",
"response": "Checking visible objects.",
"tool": "get_visible_objects",
"arguments": {}
}
The model was fine-tuned around robot-assistant tools such as:
get_visible_objects
get_robot_status
search_object
pick_object
place_object
stop
Example outputs:
{
"type": "tool_call",
"response": "Checking robot status.",
"tool": "get_robot_status",
"arguments": {}
}
{
"type": "tool_call",
"response": "Attempting to pick up the bottle.",
"tool": "pick_object",
"arguments": {
"object": "bottle"
}
}
{
"type": "tool_call",
"response": "Placing the bottle on the table.",
"tool": "place_object",
"arguments": {
"object": "bottle",
"destination": "table"
}
}
The model was trained with explicit role tags:
SYSTEM_TAG = "<|system|>"
USER_TAG = "<|user|>"
ASSISTANT_TAG = "<|assistant|>"
A typical prompt follows this structure:
<|system|>
You are OAX, a humanoid robot assistant. Always return a valid JSON object with exactly these fields: type, response, tool, arguments.
<|user|>
Find the cup.
<|assistant|>
{"type":"tool_call","response":"Searching for the cup.","tool":"search_object","arguments":{"object":"cup"}}
During inference, the prompt should end with:
so that the model generates the next JSON response.
Model Structure
This repository contains the model in two parts:
The base_model folder contains the pre-trained 1B LLaMA-style model.
The lora_adapter folder contains the supervised fine-tuned adapter used for JSON tool-calling behaviour.
A typical loading flow is:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
repo_id = "orhanaydinn/OAX-1B-Humanoid"
tokenizer = AutoTokenizer.from_pretrained(
repo_id,
subfolder="base_model",
trust_remote_code=True,
use_fast=False
)
base_model = AutoModelForCausalLM.from_pretrained(
repo_id,
subfolder="base_model",
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
trust_remote_code=True,
low_cpu_mem_usage=True
)
model = PeftModel.from_pretrained(
base_model,
repo_id,
subfolder="lora_adapter",
is_trainable=False
)
model.eval()
Depending on the local setup, it may be more reliable to download the repository first and then load the base_model/ and lora_adapter/ folders as local paths.
Example System Prompt
The model works best when the system prompt clearly defines the JSON schema and tool-calling rules:
You are OAX, a humanoid robot assistant.
Respond briefly and clearly.
When replying, always output a valid JSON object with exactly these fields: type, response, tool, arguments.
Valid type values are: chat, tool_call, clarify, refuse.
Use tool=null for chat, clarify, and refuse.
Use an empty object for arguments when no arguments are needed.
Do not add extra fields.
Do not use low-level motor or servo commands.
Do not hallucinate perception results.
If the request is incomplete, ask for clarification.
If the request is unsafe or unsupported, refuse.
Notes on Safety and Validation
This model is intended to act as a high-level reasoning layer, not as a direct actuator controller.
The model may occasionally produce imperfect, premature, or inconsistent tool calls. For this reason, it should be used with an external validation layer such as a Planner or Controller before any action is executed.
A recommended architecture is:
LLM output
→ JSON parsing
→ Controller validation
→ Action repair or rejection
→ Tool execution
→ State update
This separation is important because the model output should not directly change the robot or environment state without deterministic validation.
Limitations
This model is experimental and was developed for a research prototype.
Known limitations include:
- It may occasionally call a tool too early.
- It may produce an incorrect object or destination name.
- It may require a Controller to normalise or repair tool arguments.
- It is not designed for direct low-level robot control.
- It should not be used for safety-critical robotic control without additional verification, safety constraints, and human supervision.
Research Context
OAX-1B-Humanoid was developed as part of a humanoid robot assistant project involving:
- Natural language interaction
- Structured JSON tool calling
- Vision-aware robot commands
- Planner and Controller validation
- Pick, place, search, status, and visible-object behaviours
The model is intended for experimentation with LLM-based robot-agent interfaces and high-level humanoid robot decision-making.