herooooooooo

first-hf-run-pi-mono-gemma4-e2b-final

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Job IDs

Table with columns: Kind, Name, Job ID, Final Stage, Link
Kind	Name	Job ID	Final Stage	Link
sweep	lr1e-4-r8-alpha16	`6a39cb3ec612b71be257880a`	ERROR_WITH_ADAPTER_SAVED	job
sweep	lr2e-4-r16-alpha32	`6a39cb3ec612b71be257880c`	ERROR_WITH_ADAPTER_SAVED	job
sweep	lr5e-5-r16-alpha32	`6a39cb3fc7d51fa1097d6085`	ERROR_WITH_ADAPTER_SAVED	job
adapter-eval	lr1e-4-r8-alpha16	`6a3a50133d2ca349dc7bf6ac`	COMPLETED	job
adapter-eval	lr2e-4-r16-alpha32	`6a3a5013e902455642c9d107`	COMPLETED	job
adapter-eval	lr5e-5-r16-alpha32	`6a3a5014e902455642c9d109`	COMPLETED	job
merge	merge-selected-adapter	`6a3a52a5f6cddbe979170025`	COMPLETED	job
eval	humaneval	`6a3a5750f6cddbe97917004e`	COMPLETED	job
eval	mbpp	`6a3a5750f6cddbe979170050`	COMPLETED	job

Sweep Results

Table with columns: Run, Adapter repo, Held-out eval loss, Job id reported by child
Run	Adapter repo	Held-out eval loss	Job id reported by child
`lr1e-4-r8-alpha16`	`herooooooooo/first-hf-run-pi-mono-gemma4-e2b-adapter-lr1e-4-r8-alpha16`	2.2324344871670063	`6a3a50133d2ca349dc7bf6ac`
`lr2e-4-r16-alpha32`	`herooooooooo/first-hf-run-pi-mono-gemma4-e2b-adapter-lr2e-4-r16-alpha32`	2.0485327514494878	`6a3a5013e902455642c9d107`

Inspect AI Evals

Table with columns: Benchmark, Accuracy, StdErr, Completed, Return code
Benchmark	Accuracy	StdErr	Completed	Return code
humaneval	0.0	0.0	164/164	`0`
mbpp	0.0	0.0	1285/1285	`0`

Raw Inspect JSON logs and compact summaries are in evals/.

Known Eval Limits

Inspect evals used a custom Gemma chat template passed through -M chat_template=... because the stock Inspect HF provider handed ChatMessage objects to a dict-oriented tokenizer template.
Inspect evals used max_connections=8 on l4x1 to complete the full MBPP run within the HF Job timeout.
Adapter selection loss was recomputed by adapter-only recovery jobs because the original sweep jobs pushed adapters but crashed in the Trackio callback before writing eval_results.json.
The README table reports Inspect accuracy and stderr; it does not publish separate pass@k columns.
Inspect used the local Hugging Face Transformers backend on l4x1 with max_connections=8, not vLLM throughput.
The eval sandbox is inside the HF Job, not Docker; this is less isolated than leaderboard-grade Docker execution.

Dataset and Privacy Notes

The source dataset was exported with pi-share-hf, including deterministic redaction, deny-pattern filtering, TruffleHog scanning, and LLM review before upload. This run consumes the published redacted dataset only.

The SFT converter:

splits by session file before extracting examples;
strips assistant thinking blocks and thinking signatures;
represents tool calls as text;
folds tool results into user context;
omits image, audio, and video payloads for text-only training.

Reproducibility

The generated child scripts used by the controller are stored under run_scripts/. The article-style write-up is in ARTICLE.md.

Model provider

herooooooooo

Model tree

Base

google/gemma-4-E2B-it

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Job IDs

Table with columns: Kind, Name, Job ID, Final Stage, Link
Kind	Name	Job ID	Final Stage	Link
sweep	lr1e-4-r8-alpha16	`6a39cb3ec612b71be257880a`	ERROR_WITH_ADAPTER_SAVED	job
sweep	lr2e-4-r16-alpha32	`6a39cb3ec612b71be257880c`	ERROR_WITH_ADAPTER_SAVED	job
sweep	lr5e-5-r16-alpha32	`6a39cb3fc7d51fa1097d6085`	ERROR_WITH_ADAPTER_SAVED	job
adapter-eval	lr1e-4-r8-alpha16	`6a3a50133d2ca349dc7bf6ac`	COMPLETED	job
adapter-eval	lr2e-4-r16-alpha32	`6a3a5013e902455642c9d107`	COMPLETED	job
adapter-eval	lr5e-5-r16-alpha32	`6a3a5014e902455642c9d109`	COMPLETED	job
merge	merge-selected-adapter	`6a3a52a5f6cddbe979170025`	COMPLETED	job
eval	humaneval	`6a3a5750f6cddbe97917004e`	COMPLETED	job
eval	mbpp	`6a3a5750f6cddbe979170050`	COMPLETED	job

Sweep Results

Table with columns: Run, Adapter repo, Held-out eval loss, Job id reported by child
Run	Adapter repo	Held-out eval loss	Job id reported by child
`lr1e-4-r8-alpha16`	`herooooooooo/first-hf-run-pi-mono-gemma4-e2b-adapter-lr1e-4-r8-alpha16`	2.2324344871670063	`6a3a50133d2ca349dc7bf6ac`
`lr2e-4-r16-alpha32`	`herooooooooo/first-hf-run-pi-mono-gemma4-e2b-adapter-lr2e-4-r16-alpha32`	2.0485327514494878	`6a3a5013e902455642c9d107`

Inspect AI Evals

Table with columns: Benchmark, Accuracy, StdErr, Completed, Return code
Benchmark	Accuracy	StdErr	Completed	Return code
humaneval	0.0	0.0	164/164	`0`
mbpp	0.0	0.0	1285/1285	`0`

Raw Inspect JSON logs and compact summaries are in evals/.

Known Eval Limits

Inspect evals used a custom Gemma chat template passed through -M chat_template=... because the stock Inspect HF provider handed ChatMessage objects to a dict-oriented tokenizer template.
Inspect evals used max_connections=8 on l4x1 to complete the full MBPP run within the HF Job timeout.
Adapter selection loss was recomputed by adapter-only recovery jobs because the original sweep jobs pushed adapters but crashed in the Trackio callback before writing eval_results.json.
The README table reports Inspect accuracy and stderr; it does not publish separate pass@k columns.
Inspect used the local Hugging Face Transformers backend on l4x1 with max_connections=8, not vLLM throughput.
The eval sandbox is inside the HF Job, not Docker; this is less isolated than leaderboard-grade Docker execution.

Dataset and Privacy Notes

The SFT converter:

splits by session file before extracting examples;
strips assistant thinking blocks and thinking signatures;
represents tool calls as text;
folds tool results into user context;
omits image, audio, and video payloads for text-only training.

Reproducibility

The generated child scripts used by the controller are stored under run_scripts/. The article-style write-up is in ARTICLE.md.