Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Result (LawBench 191-class, 913-case held-out test, top-1 accuracy)
| approach | acc |
|---|---|
| prior SOTA | 0.450 |
| SIA gpt-oss-120b (W+H) | 0.701 |
| TF-IDF harness (no LLM) | 0.760 |
| best: this LoRA (vLLM, rope_theta fix) ⊕ TF-IDF ensemble | 0.77 |
Beats SOTA and SIA's 0.701. Honest caveat: the winning 0.77 is an ensemble of this LoRA (served via vLLM with a rope_theta fix) and a TF-IDF char-ngram classifier; the LoRA's contribution is the marginal lift over the 0.760 harness. Naive LoRA inference without the rope_theta fix scored far lower.
Files
adapter_model.safetensors— the LoRA weightsadapter_config.json,tokenizer.json,chat_template.jinja
Replicate
Task + harness: github.com/evo-hq/evo-posttrainbench (branch evo-variant,
src/eval/tasks/lawbench). Optimizer: evo 0.5.0-alpha.13 (github.com/evo-hq/evo,
release/0.5). Run: scripts/run.sh run lawbench openai/gpt-oss-120b <hours>.
Model provider
alok97
Model tree
Base
openai/gpt-oss-120b
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information