Model Details
- Developed by: J. D. Bohrman
- Affiliation: Independent Researcher
- Base model:
Qwen/Qwen2.5-Coder-1.5B
- Model type: LoRA adapter for causal language modeling
- License: Check the parent repository for the current project license
- Repository: https://github.com/comradelemoncake/sst
- Paper:
papers/sstt-paper in the repository
Intended Use
This adapter is intended for robotics coordination SSTT experiments in packages/sst-world-robotics.
Input:
- natural-language instruction
- structured world state JSON
Output:
- a structured JSON edit describing the requested state transition
The best reported result uses a compact cooperative narrowing hint:
The published evaluation path also applies a narrow omission repair:
- if the model emits
{"op":"assign_task","robotId":"..."} without taskId
- and the prompt state includes
scoutSummary.targetTaskId
- then
taskId is filled from that field before validation
Out of Scope
This model is not intended as a standalone robotics policy, controller, or safety-critical deployment model. It is a research artifact for structured decision generation over synthetic benchmark states.
Training Data
The training and evaluation data are synthetic and code-generated. The relevant harder benchmark slice is packages/data-hard-v2.
Evaluation
Best validated result on the hard_v2 robotics test slice (n=250):
- configuration: compact
scoutSummary guidance, top_k=3, omission repair enabled
valid_json: 250/250
valid_schema: 250/250
exact_op: 250/250
exact_match: 175/250
exact_match_rate: 0.700
Comparison variants from the same benchmark:
- baseline:
148/250 = 0.592
- prune scout top-3:
136/250 = 0.544
- scout summary top-3:
170/250 = 0.680
- scout summary top-3 + omission repair:
175/250 = 0.700
Saved evaluation artifacts live in:
packages/sst-world-robotics/evals/eval_hard_v2_baseline_250.json
packages/sst-world-robotics/evals/eval_hard_v2_scout_prune_top3_250.json
packages/sst-world-robotics/evals/eval_hard_v2_scout_summary_top3_250.json
packages/sst-world-robotics/evals/eval_hard_v2_scout_summary_top3_repaired_250.json
How to Run
Local evaluation:
cd packages/sst-world-robotics
python3 training/validate.py \
--model trained-model/final \
--test ../data-hard-v2/test.jsonl \
--max-examples 250 \
--use-scout \
--top-k 3
Single-example inference:
cd packages/sst-world-robotics
python3 training/run_inference.py \
--model trained-model/final \
--instruction "Switch the best-linked recon unit into mission mode" \
--state-json '<json here>' \
--use-scout \
--top-k 3
Limitations
- The benchmark data are synthetic.
- The strongest result depends on structured cooperative guidance in the prompt state.
reassign_task remains the hardest visible operation family in spot checks and aggregate results.
- This adapter should be evaluated with the accompanying validation logic, not treated as unconstrained free-form generation.