comradelemoncake

pluggable-specialists-robotics

Deploy Dedicated

Model Details

Developed by: J. D. Bohrman
Affiliation: Independent Researcher
Base model: Qwen/Qwen2.5-Coder-1.5B
Model type: LoRA adapter for causal language modeling
License: Check the parent repository for the current project license
Repository: https://github.com/comradelemoncake/sst
Paper: papers/sstt-paper in the repository

Intended Use

This adapter is intended for robotics coordination SSTT experiments in packages/sst-world-robotics.

Input:

natural-language instruction
structured world state JSON

Output:

a structured JSON edit describing the requested state transition

The best reported result uses a compact cooperative narrowing hint:

--use-scout
--top-k 3

The published evaluation path also applies a narrow omission repair:

if the model emits {"op":"assign_task","robotId":"..."} without taskId
and the prompt state includes scoutSummary.targetTaskId
then taskId is filled from that field before validation

Out of Scope

This model is not intended as a standalone robotics policy, controller, or safety-critical deployment model. It is a research artifact for structured decision generation over synthetic benchmark states.

Training Data

The training and evaluation data are synthetic and code-generated. The relevant harder benchmark slice is packages/data-hard-v2.

Evaluation

Best validated result on the hard_v2 robotics test slice (n=250):

configuration: compact scoutSummary guidance, top_k=3, omission repair enabled
valid_json: 250/250
valid_schema: 250/250
exact_op: 250/250
exact_match: 175/250
exact_match_rate: 0.700

Comparison variants from the same benchmark:

baseline: 148/250 = 0.592
prune scout top-3: 136/250 = 0.544
scout summary top-3: 170/250 = 0.680
scout summary top-3 + omission repair: 175/250 = 0.700

Saved evaluation artifacts live in:

packages/sst-world-robotics/evals/eval_hard_v2_baseline_250.json
packages/sst-world-robotics/evals/eval_hard_v2_scout_prune_top3_250.json
packages/sst-world-robotics/evals/eval_hard_v2_scout_summary_top3_250.json
packages/sst-world-robotics/evals/eval_hard_v2_scout_summary_top3_repaired_250.json

How to Run

Local evaluation:

bash
cd packages/sst-world-robotics
python3 training/validate.py \
  --model trained-model/final \
  --test ../data-hard-v2/test.jsonl \
  --max-examples 250 \
  --use-scout \
  --top-k 3

Single-example inference:

bash
cd packages/sst-world-robotics
python3 training/run_inference.py \
  --model trained-model/final \
  --instruction "Switch the best-linked recon unit into mission mode" \
  --state-json '<json here>' \
  --use-scout \
  --top-k 3

Limitations

The benchmark data are synthetic.
The strongest result depends on structured cooperative guidance in the prompt state.
reassign_task remains the hardest visible operation family in spot checks and aggregate results.
This adapter should be evaluated with the accompanying validation logic, not treated as unconstrained free-form generation.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

comradelemoncake

Model Tree

Base

Qwen/Qwen2.5-Coder-1.5B

Adapter

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer

Model Details

Developed by: J. D. Bohrman
Affiliation: Independent Researcher
Base model: Qwen/Qwen2.5-Coder-1.5B
Model type: LoRA adapter for causal language modeling
License: Check the parent repository for the current project license
Repository: https://github.com/comradelemoncake/sst
Paper: papers/sstt-paper in the repository

Intended Use

This adapter is intended for robotics coordination SSTT experiments in packages/sst-world-robotics.

Input:

natural-language instruction
structured world state JSON

Output:

a structured JSON edit describing the requested state transition

The best reported result uses a compact cooperative narrowing hint:

--use-scout
--top-k 3

The published evaluation path also applies a narrow omission repair:

if the model emits {"op":"assign_task","robotId":"..."} without taskId
and the prompt state includes scoutSummary.targetTaskId
then taskId is filled from that field before validation

Out of Scope

Training Data

The training and evaluation data are synthetic and code-generated. The relevant harder benchmark slice is packages/data-hard-v2.

Evaluation

Best validated result on the hard_v2 robotics test slice (n=250):

configuration: compact scoutSummary guidance, top_k=3, omission repair enabled
valid_json: 250/250
valid_schema: 250/250
exact_op: 250/250
exact_match: 175/250
exact_match_rate: 0.700

Comparison variants from the same benchmark:

baseline: 148/250 = 0.592
prune scout top-3: 136/250 = 0.544
scout summary top-3: 170/250 = 0.680
scout summary top-3 + omission repair: 175/250 = 0.700

Saved evaluation artifacts live in:

packages/sst-world-robotics/evals/eval_hard_v2_baseline_250.json
packages/sst-world-robotics/evals/eval_hard_v2_scout_prune_top3_250.json
packages/sst-world-robotics/evals/eval_hard_v2_scout_summary_top3_250.json
packages/sst-world-robotics/evals/eval_hard_v2_scout_summary_top3_repaired_250.json

How to Run

Local evaluation:

bash
cd packages/sst-world-robotics
python3 training/validate.py \
  --model trained-model/final \
  --test ../data-hard-v2/test.jsonl \
  --max-examples 250 \
  --use-scout \
  --top-k 3

Single-example inference:

bash
cd packages/sst-world-robotics
python3 training/run_inference.py \
  --model trained-model/final \
  --instruction "Switch the best-linked recon unit into mission mode" \
  --state-json '<json here>' \
  --use-scout \
  --top-k 3

Limitations

The benchmark data are synthetic.
The strongest result depends on structured cooperative guidance in the prompt state.
reassign_task remains the hardest visible operation family in spot checks and aggregate results.
This adapter should be evaluated with the accompanying validation logic, not treated as unconstrained free-form generation.

pluggable-specialists-robotics

README

Model Details

Intended Use

Out of Scope

Training Data

Evaluation

How to Run

Limitations

Explore FriendliAI today

README

Model Details

Intended Use

Out of Scope

Training Data

Evaluation

How to Run

Limitations