Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Task

Given a paper's context and a goal, the model produces a detailed, controlled ablation experiment design plan (objective, setup, variants, fixed protocols and metrics).

Training data

SFT on train/sft_task2_37019.jsonl, from SlowGuess/abforge-data (derived from CC-licensed research papers). Evaluation uses the held-out AblationBench split (eval/ablationbench_200.jsonl) of the same dataset.

Related models (Task 2)

Evaluation

Reproduce AblationBench evaluation with the SlowGuess/Abforge_1 code:

bash

git clone https://github.com/SlowGuess/Abforge_1 && cd Abforge_1
huggingface-cli download SlowGuess/abforge-data --repo-type dataset --local-dir data
export MODEL_PATH=SlowGuess/ABForge-Qwen3-8B-Task2-SFT
# 1. Generate predictions on AblationBench
python run_inference_local.py --task 2 \
--input data/eval/ablationbench_200.jsonl \
--output preds.jsonl \
--model-path "$MODEL_PATH" --dtype bf16 --max-new-tokens 4096
# 2. Score against the fixed AblationBench rubric (Claude judge)
export ANTHROPIC_API_KEY=<your-key>
python scripts/eval_task2_claude_rubric_v2.py --input preds.jsonl --output scored.jsonl

Links

Citation

bibtex

@misc{abforge,
title = {ABForge: A Post-Training Pipeline for Paper-Grounded Ablation Design},
author = {ABForge authors},
year = {2026},
howpublished = {\url{https://github.com/SlowGuess/Abforge_1}}
}

Model provider

SlowGuess

Model tree

Base

Qwen/Qwen3-8B

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today