Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Training Traces
Training-time Daytona/Harbor rollouts for this run are uploaded as a companion dataset: penfever/a3-rl-DCAgent_exp_rpt_e2egit-large
The dataset contains the last episode of each trial (per
make_and_upload_trace_dataset --episodes last) — the same rollouts
the policy was trained on after rollback / truncation.
Training Logs
training_logs/ contains metrics.csv, vllm_metrics.csv,
trial_stats.csv, report.md, and reward_plot.png from
parse_skyrl_metrics.py, plus the raw trainer_log.jsonl and
*.out files for archival (Jupiter has no W&B network access).
RL Config
See rl_config.json for the full Hydra overrides used to launch the run.
Model provider
laion
Model tree
Base
laion/GLM-4_7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k-fixthink
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information