Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Task
Given the ablation-free context of a research paper, the model proposes candidate ablation objectives, each expressed as a Target Module (the component to ablate) paired with a Research Question it is meant to answer.
Training data
SFT on train/sft_task1_45961.jsonl, then GRPO on train/RL_task1_30K.jsonl, from SlowGuess/abforge-data
(derived from CC-licensed research papers). Evaluation uses the held-out AblationBench split
(eval/ablationbench_200.jsonl) of the same dataset.
Related models (Task 1)
SlowGuess/ABForge-Qwen3-8B-Task1(this model)SlowGuess/ABForge-Qwen3-8B-Task1-SFTSlowGuess/ABForge-Qwen3-8B-Task1-RL
Evaluation
Reproduce AblationBench evaluation with the SlowGuess/Abforge_1 code:
bash
git clone https://github.com/SlowGuess/Abforge_1 && cd Abforge_1huggingface-cli download SlowGuess/abforge-data --repo-type dataset --local-dir dataexport MODEL_PATH=SlowGuess/ABForge-Qwen3-8B-Task1# 1. Generate predictions on AblationBenchpython run_inference_local.py --task 1 \--input data/eval/ablationbench_200.jsonl \--output preds.jsonl \--model-path "$MODEL_PATH" --dtype bf16 --max-new-tokens 4096# 2. Score against the fixed AblationBench rubric (Claude judge)export ANTHROPIC_API_KEY=<your-key>python scripts/eval_task1_claude.py --input preds.jsonl --output scored.jsonl
Links
- Dataset:
SlowGuess/abforge-data - Code:
SlowGuess/Abforge_1
Citation
bibtex
@misc{abforge,title = {ABForge: A Post-Training Pipeline for Paper-Grounded Ablation Design},author = {ABForge authors},year = {2026},howpublished = {\url{https://github.com/SlowGuess/Abforge_1}}}
Model provider
SlowGuess
Model tree
Base
Qwen/Qwen3-8B
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information