Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Introduction

We present Lean Finder, a semantic search engine for Lean and mathlib that understands and aligns with the intents of mathematicians, published at ICLR 2026. Progress in formal theorem proving is often hindered by the difficulty of locating relevant theorems and the steep learning curve of the Lean 4 language. Existing Lean search engines rely primarily on informalizations (natural language translations of formal statements), while largely overlooking the mismatch with real-world user queries.

Lean Finder proposes a user-centered semantic search tailored to the needs of mathematicians. Our approach:

  1. User-centered query synthesis: Analyzes and clusters the semantics of public Lean discussions, then fine-tunes text embeddings on synthesized queries that emulate user intents across diverse intent clusters.
  2. Diverse input modalities: Supports informalized statements, synthetic user queries, augmented proof states, and formal statements. Model was trained with over 1.4M query-code pairs in total.
  3. Preference alignment: Further aligns Lean Finder with mathematicians' preferences via DPO using human and LLM feedback collected from a deployed web service.

Performance

  • 81.6% Top-3 preference rate in a user study with real Lean users.
  • 30%+ relative improvement (Recall@1) over previous search engines and GPT-4o on informalized statement queries.
  • 16%+ relative improvement on proof state queries.
  • Compatible with LLM-based theorem provers as a plug-and-play retrieval backbone.

Model Versions

The model in this repository's main revision is a newly optimized version that is significantly stronger than the model described in the paper. The model corresponding to the paper release is available in the old-model revision. For details on the differences and how to access each version, see the GitHub repository.

Usage

Try the web service at leanfinder.github.io.

For code, dataset, and instructions for hosting the model locally, see the GitHub repository.

Citation

bibtex

@article{lu2025lean,
title={Lean finder: Semantic search for mathlib that understands user intents},
author={Lu, Jialin and Emond, Kye and Yang, Kaiyu and Chaudhuri, Swarat and Sun, Weiran and Chen, Wuyang},
journal={arXiv preprint arXiv:2510.15940},
year={2025}
}

Model provider

delta-lab-ai

delta-lab-ai

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today