orhanaydinn

OAX-1B-Humanoid-Merged

README

License: apache-2.0

Intended Use

This model is intended for research and prototyping around LLM-driven humanoid robot assistants.

It is designed to convert user commands into structured JSON responses such as:

json
{
  "type": "tool_call",
  "response": "Searching for the cup.",
  "tool": "search_object",
  "arguments": {
    "object": "cup"
  }
}

The expected use case is a controlled robot-agent pipeline:

text
User command
→ LLM JSON response
→ Planner / Controller validation
→ Tool execution
→ Robot or environment state update

Output Format

The model is fine-tuned to return a valid JSON object with exactly these fields:

json
{
  "type": "tool_call",
  "response": "short natural language explanation",
  "tool": "tool_name",
  "arguments": {}
}

Valid type values are:

chat
tool_call
clarify
refuse

For chat, clarify, and refuse, the tool field should be null.

When no arguments are required, arguments should be an empty object:

json
{
  "type": "tool_call",
  "response": "Checking visible objects.",
  "tool": "get_visible_objects",
  "arguments": {}
}

Tool-Calling Behaviour

The model was fine-tuned around robot-assistant tools such as:

get_visible_objects
get_robot_status
search_object
pick_object
place_object
stop

Example outputs:

json
{
  "type": "tool_call",
  "response": "Checking robot status.",
  "tool": "get_robot_status",
  "arguments": {}
}

json
{
  "type": "tool_call",
  "response": "Attempting to pick up the bottle.",
  "tool": "pick_object",
  "arguments": {
    "object": "bottle"
  }
}

json
{
  "type": "tool_call",
  "response": "Placing the bottle on the table.",
  "tool": "place_object",
  "arguments": {
    "object": "bottle",
    "destination": "table"
  }
}

Prompt Format

The model was trained with explicit role tags:

python
SYSTEM_TAG = "<|system|>"
USER_TAG = "<|user|>"
ASSISTANT_TAG = "<|assistant|>"

A typical prompt follows this structure:

text
<|system|>
You are OAX, a humanoid robot assistant. Always return a valid JSON object with exactly these fields: type, response, tool, arguments.

<|user|>
Find the cup.

<|assistant|>
{"type":"tool_call","response":"Searching for the cup.","tool":"search_object","arguments":{"object":"cup"}}

During inference, the prompt should end with:

text
<|assistant|>

so that the model generates the next JSON response.

Model Structure

This repository contains the model in two parts:

text
base_model/
lora_adapter/

The base_model folder contains the pre-trained 1B LLaMA-style model.

The lora_adapter folder contains the supervised fine-tuned adapter used for JSON tool-calling behaviour.

A typical loading flow is:

python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

repo_id = "orhanaydinn/OAX-1B-Humanoid"

tokenizer = AutoTokenizer.from_pretrained(
    repo_id,
    subfolder="base_model",
    trust_remote_code=True,
    use_fast=False
)

base_model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    subfolder="base_model",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    trust_remote_code=True,
    low_cpu_mem_usage=True
)

model = PeftModel.from_pretrained(
    base_model,
    repo_id,
    subfolder="lora_adapter",
    is_trainable=False
)

model.eval()

Depending on the local setup, it may be more reliable to download the repository first and then load the base_model/ and lora_adapter/ folders as local paths.

Example System Prompt

The model works best when the system prompt clearly defines the JSON schema and tool-calling rules:

text
You are OAX, a humanoid robot assistant.
Respond briefly and clearly.
When replying, always output a valid JSON object with exactly these fields: type, response, tool, arguments.
Valid type values are: chat, tool_call, clarify, refuse.
Use tool=null for chat, clarify, and refuse.
Use an empty object for arguments when no arguments are needed.
Do not add extra fields.
Do not use low-level motor or servo commands.
Do not hallucinate perception results.
If the request is incomplete, ask for clarification.
If the request is unsafe or unsupported, refuse.

Notes on Safety and Validation

This model is intended to act as a high-level reasoning layer, not as a direct actuator controller.

The model may occasionally produce imperfect, premature, or inconsistent tool calls. For this reason, it should be used with an external validation layer such as a Planner or Controller before any action is executed.

A recommended architecture is:

text
LLM output
→ JSON parsing
→ Controller validation
→ Action repair or rejection
→ Tool execution
→ State update

This separation is important because the model output should not directly change the robot or environment state without deterministic validation.

Limitations

This model is experimental and was developed for a research prototype.

Known limitations include:

It may occasionally call a tool too early.
It may produce an incorrect object or destination name.
It may require a Controller to normalise or repair tool arguments.
It is not designed for direct low-level robot control.
It should not be used for safety-critical robotic control without additional verification, safety constraints, and human supervision.

Research Context

OAX-1B-Humanoid was developed as part of a humanoid robot assistant project involving:

Natural language interaction
Structured JSON tool calling
Vision-aware robot commands
Planner and Controller validation
Pick, place, search, status, and visible-object behaviours

The model is intended for experimentation with LLM-based robot-agent interfaces and high-level humanoid robot decision-making.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.