Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Model Details
- Developed by: J. D. Bohrman
- Affiliation: Independent Researcher
- Base model:
Qwen/Qwen2.5-Coder-1.5B - Model type: LoRA adapter for causal language modeling
- License: Check the parent repository for the current project license
- Repository: https://github.com/comradelemoncake/sst
- Paper:
papers/sstt-paperin the repository
Intended Use
This adapter is intended for robotics coordination SSTT experiments in packages/sst-world-robotics.
Input:
- natural-language instruction
- structured world state JSON
Output:
- a structured JSON edit describing the requested state transition
The best reported result uses a compact cooperative narrowing hint:
--use-scout--top-k 3
The published evaluation path also applies a narrow omission repair:
- if the model emits
{"op":"assign_task","robotId":"..."}withouttaskId - and the prompt state includes
scoutSummary.targetTaskId - then
taskIdis filled from that field before validation
Out of Scope
This model is not intended as a standalone robotics policy, controller, or safety-critical deployment model. It is a research artifact for structured decision generation over synthetic benchmark states.
Training Data
The training and evaluation data are synthetic and code-generated. The relevant harder benchmark slice is packages/data-hard-v2.
Evaluation
Best validated result on the hard_v2 robotics test slice (n=250):
- configuration: compact
scoutSummaryguidance,top_k=3, omission repair enabled valid_json:250/250valid_schema:250/250exact_op:250/250exact_match:175/250exact_match_rate:0.700
Comparison variants from the same benchmark:
- baseline:
148/250 = 0.592 - prune scout top-3:
136/250 = 0.544 - scout summary top-3:
170/250 = 0.680 - scout summary top-3 + omission repair:
175/250 = 0.700
Saved evaluation artifacts live in:
packages/sst-world-robotics/evals/eval_hard_v2_baseline_250.jsonpackages/sst-world-robotics/evals/eval_hard_v2_scout_prune_top3_250.jsonpackages/sst-world-robotics/evals/eval_hard_v2_scout_summary_top3_250.jsonpackages/sst-world-robotics/evals/eval_hard_v2_scout_summary_top3_repaired_250.json
How to Run
Local evaluation:
bash
cd packages/sst-world-roboticspython3 training/validate.py \--model trained-model/final \--test ../data-hard-v2/test.jsonl \--max-examples 250 \--use-scout \--top-k 3
Single-example inference:
bash
cd packages/sst-world-roboticspython3 training/run_inference.py \--model trained-model/final \--instruction "Switch the best-linked recon unit into mission mode" \--state-json '<json here>' \--use-scout \--top-k 3
Limitations
- The benchmark data are synthetic.
- The strongest result depends on structured cooperative guidance in the prompt state.
reassign_taskremains the hardest visible operation family in spot checks and aggregate results.- This adapter should be evaluated with the accompanying validation logic, not treated as unconstrained free-form generation.
Model provider
comradelemoncake
Model tree
Base
Qwen/Qwen2.5-Coder-1.5B
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information