Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Overview

Zen Reranker is optimized for:

  • Retrieval-Augmented Generation (RAG) — re-score retrieved passages for LLM context
  • Search quality improvement — rerank initial BM25/dense retrieval results
  • Cross-lingual retrieval — strong multilingual performance
  • DSO integration — compatible with Hanzo's Decentralized Semantic Optimization

Quick Start

python

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "zenlm/zen-reranker"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, torch_dtype=torch.float16)
def rerank(query, passages):
pairs = [[query, p] for p in passages]
inputs = tokenizer(
pairs, padding=True, truncation=True,
max_length=512, return_tensors="pt"
)
with torch.no_grad():
scores = model(**inputs).logits.squeeze(-1)
ranked = sorted(zip(passages, scores.tolist()), key=lambda x: x[1], reverse=True)
return ranked
query = "What is the capital of France?"
passages = ["Paris is the capital of France.", "Berlin is in Germany.", "Madrid is in Spain."]
results = rerank(query, passages)
for passage, score in results:
print(f"{score:.3f}: {passage}")

With sentence-transformers

python

from sentence_transformers import CrossEncoder
model = CrossEncoder("zenlm/zen-reranker")
scores = model.predict([
["What is the capital of France?", "Paris is the capital of France."],
["What is the capital of France?", "Berlin is in Germany."],
])

Specifications

AttributeValue
Parameters4B
ArchitectureQwen3ForSequenceClassification
Context32,768 tokens
Languages100+ (multilingual)
LicenseApache 2.0

Use Cases

  1. RAG pipelines — rerank retrieved chunks before passing to LLM
  2. Search engines — improve document ranking quality
  3. QA systems — score answer candidates for relevance
  4. Semantic deduplication — score similarity for clustering

Abliteration

Like all Zen models, Zen Reranker is abliterated — refusal bias has been removed using directional ablation via hanzoai/remove-refusals.

Technique: Refusal in LLMs is mediated by a single direction — Arditi et al.

Model Family

ModelParametersUse Case
Zen Nano0.6BEdge AI
Zen Scribe4BWriting
Zen Pro8BProfessional AI
Zen Reranker4BRetrieval
Zen EmbeddingEmbeddings

Citation

bibtex

@misc{zen-reranker-2025,
title={Zen Reranker: High-Performance Neural Reranking},
author={Hanzo AI and Zoo Labs Foundation},
year={2025},
url={https://huggingface.co/zenlm/zen-reranker}
}

Part of the Zen model ecosystem by Hanzo AI (Techstars '17) and Zoo Labs Foundation.

Model provider

zenlm

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today