Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

Model Details

  • Developed by: J. D. Bohrman
  • Affiliation: Independent Researcher
  • Base model: Qwen/Qwen2.5-Coder-1.5B
  • Model type: LoRA adapter for causal language modeling
  • License: Check the parent repository for the current project license
  • Repository: https://github.com/comradelemoncake/sst
  • Paper: papers/sstt-paper in the repository

Intended Use

This adapter is intended for robotics coordination SSTT experiments in packages/sst-world-robotics.

Input:

  • natural-language instruction
  • structured world state JSON

Output:

  • a structured JSON edit describing the requested state transition

The best reported result uses a compact cooperative narrowing hint:

  • --use-scout
  • --top-k 3

The published evaluation path also applies a narrow omission repair:

  • if the model emits {"op":"assign_task","robotId":"..."} without taskId
  • and the prompt state includes scoutSummary.targetTaskId
  • then taskId is filled from that field before validation

Out of Scope

This model is not intended as a standalone robotics policy, controller, or safety-critical deployment model. It is a research artifact for structured decision generation over synthetic benchmark states.

Training Data

The training and evaluation data are synthetic and code-generated. The relevant harder benchmark slice is packages/data-hard-v2.

Evaluation

Best validated result on the hard_v2 robotics test slice (n=250):

  • configuration: compact scoutSummary guidance, top_k=3, omission repair enabled
  • valid_json: 250/250
  • valid_schema: 250/250
  • exact_op: 250/250
  • exact_match: 175/250
  • exact_match_rate: 0.700

Comparison variants from the same benchmark:

  • baseline: 148/250 = 0.592
  • prune scout top-3: 136/250 = 0.544
  • scout summary top-3: 170/250 = 0.680
  • scout summary top-3 + omission repair: 175/250 = 0.700

Saved evaluation artifacts live in:

  • packages/sst-world-robotics/evals/eval_hard_v2_baseline_250.json
  • packages/sst-world-robotics/evals/eval_hard_v2_scout_prune_top3_250.json
  • packages/sst-world-robotics/evals/eval_hard_v2_scout_summary_top3_250.json
  • packages/sst-world-robotics/evals/eval_hard_v2_scout_summary_top3_repaired_250.json

How to Run

Local evaluation:

bash

cd packages/sst-world-robotics
python3 training/validate.py \
--model trained-model/final \
--test ../data-hard-v2/test.jsonl \
--max-examples 250 \
--use-scout \
--top-k 3

Single-example inference:

bash

cd packages/sst-world-robotics
python3 training/run_inference.py \
--model trained-model/final \
--instruction "Switch the best-linked recon unit into mission mode" \
--state-json '<json here>' \
--use-scout \
--top-k 3

Limitations

  • The benchmark data are synthetic.
  • The strongest result depends on structured cooperative guidance in the prompt state.
  • reassign_task remains the hardest visible operation family in spot checks and aggregate results.
  • This adapter should be evaluated with the accompanying validation logic, not treated as unconstrained free-form generation.

Model provider

comradelemoncake

Model tree

Base

Qwen/Qwen2.5-Coder-1.5B

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today