Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Overview
Zen Reranker is optimized for:
- Retrieval-Augmented Generation (RAG) — re-score retrieved passages for LLM context
- Search quality improvement — rerank initial BM25/dense retrieval results
- Cross-lingual retrieval — strong multilingual performance
- DSO integration — compatible with Hanzo's Decentralized Semantic Optimization
Quick Start
python
import torchfrom transformers import AutoTokenizer, AutoModelForSequenceClassificationmodel_name = "zenlm/zen-reranker"tokenizer = AutoTokenizer.from_pretrained(model_name)model = AutoModelForSequenceClassification.from_pretrained(model_name, torch_dtype=torch.float16)def rerank(query, passages):pairs = [[query, p] for p in passages]inputs = tokenizer(pairs, padding=True, truncation=True,max_length=512, return_tensors="pt")with torch.no_grad():scores = model(**inputs).logits.squeeze(-1)ranked = sorted(zip(passages, scores.tolist()), key=lambda x: x[1], reverse=True)return rankedquery = "What is the capital of France?"passages = ["Paris is the capital of France.", "Berlin is in Germany.", "Madrid is in Spain."]results = rerank(query, passages)for passage, score in results:print(f"{score:.3f}: {passage}")
With sentence-transformers
python
from sentence_transformers import CrossEncodermodel = CrossEncoder("zenlm/zen-reranker")scores = model.predict([["What is the capital of France?", "Paris is the capital of France."],["What is the capital of France?", "Berlin is in Germany."],])
Specifications
| Attribute | Value |
|---|---|
| Parameters | 4B |
| Architecture | Qwen3ForSequenceClassification |
| Context | 32,768 tokens |
| Languages | 100+ (multilingual) |
| License | Apache 2.0 |
Use Cases
- RAG pipelines — rerank retrieved chunks before passing to LLM
- Search engines — improve document ranking quality
- QA systems — score answer candidates for relevance
- Semantic deduplication — score similarity for clustering
Abliteration
Like all Zen models, Zen Reranker is abliterated — refusal bias has been removed using directional ablation via hanzoai/remove-refusals.
Technique: Refusal in LLMs is mediated by a single direction — Arditi et al.
Model Family
| Model | Parameters | Use Case |
|---|---|---|
| Zen Nano | 0.6B | Edge AI |
| Zen Scribe | 4B | Writing |
| Zen Pro | 8B | Professional AI |
| Zen Reranker | 4B | Retrieval |
| Zen Embedding | — | Embeddings |
Citation
bibtex
@misc{zen-reranker-2025,title={Zen Reranker: High-Performance Neural Reranking},author={Hanzo AI and Zoo Labs Foundation},year={2025},url={https://huggingface.co/zenlm/zen-reranker}}
Part of the Zen model ecosystem by Hanzo AI (Techstars '17) and Zoo Labs Foundation.
Model provider
zenlm
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information