Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

A 1B model can explain the correct browser action before it can reliably choose it.

This repository contains a LoRA adapter trained on top of MiniCPM5-1B for lightweight browser planning tasks.

The original goal was simple:

Can a 1B model decide the next browser action given a task and an observation?

Actions include:

  • search
  • open_page
  • extract
  • back
  • finish
  • refine_search

Key Findings

1. Data quality beats data quantity

Adding large amounts of similar trajectory data produced almost no improvement.

However, adding only ~200 carefully designed hard examples significantly improved replanning behavior.

2. Adding actions creates both capability and confusion

Introducing the back action allowed the model to recover from wrong pages and paywalls.

However, the model quickly learned to overuse back as a universal solution.

3. Reason-First training dramatically improves planning

Action-only planning:

4/12

Reason-First planning:

10/12

Using only 40 reasoning examples and less than 10 seconds of additional training.

The most important result:

The model already understood the state of the environment.

It failed because it learned shortcut action heuristics.

Forcing the model to explicitly generate a reason before selecting an action dramatically improved decision quality.

Example

Task:

Find Apple stock price

Observation:

Price displayed prominently on page.

Reason:

The requested information is already available.

Action:

extract


Task:

Find CEO of OpenAI

Observation:

Page discusses Microsoft CEO.

Reason:

The page is irrelevant to the requested information.

Action:

back

Training

Base model:

openbmb/MiniCPM5-1B

Method:

LoRA fine-tuning

Framework:

Unsloth + PEFT

Limitations

The model performs well on simple browser planning and replanning scenarios.

However, it still struggles with:

  • multi-step recovery chains
  • long-horizon planning
  • complex search strategy generation
  • comparison tasks requiring multiple sources

Conclusion

This project suggests that explicit reasoning may act as a lightweight regularizer for small planning models.

A 1B model can often explain the correct action before it can reliably choose it.

This repository contains only the LoRA adapter.

The base model must be downloaded separately.

Model provider

Georgefifth

Model tree

Base

openbmb/MiniCPM5-1B

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today