Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Model Description
ProtGPT3-10B-dpo is the DPO-aligned version of ProtGPT3-10B. It is part of the ProtGPT3 family, an open-source suite of promptable and aligned protein language models for protein design.
ProtGPT3-10B-dpo was further aligned with Direct Preference Optimization (DPO) to improve generation quality. The alignment procedure shifts the model toward protein sequences with higher predicted structural confidence and reduced low-complexity content, while preserving sequence diversity. For protein generation, each model -dpo version is recommended over the base model.
For more info and guidance on how to generate sequences with ProtGPT3-10B-dpo check out the extensive description provided in ProtGPT3-1.3B, just replacing the model name (i.e., model_name=AI4PD/ProtGPT3-10B-dpo).
Uses
Direct Use
ProtGPT3-10B-dpo can be used for single-sequence autoregressive protein generation. Users can generate protein sequences unconditionally or condition generation on an amino-acid prefix.
Compared with the base ProtGPT3-10B checkpoint, this DPO-aligned model is intended for users who want generations biased toward higher-complexity sequences with improved predicted structural confidence.
Downstream Use
The model may be used in protein design workflows, computational screening pipelines, protein variant generation, and candidate sequence proposal. Generated sequences can be further evaluated with structure prediction, sequence-complexity filters, solubility filters, fitness predictors, or experimental validation.
Out-of-Scope Use
The model should not be used as the sole basis for experimental, clinical, environmental, or safety-critical decisions. Generated sequences require downstream computational and experimental validation. The model is not guaranteed to generate functional, soluble, safe, synthesizable, or experimentally successful proteins.
The model should not be used for irresponsible or harmful biological design applications.
Bias, Risks, and Limitations
ProtGPT3-10B-dpo learns from public protein sequence datasets and may reproduce biases present in those datasets. Although DPO alignment reduces low-complexity generations and improves generation quality according to the alignment objectives (pLDDT and reduction of lcr, as a binary objective, see main manuscript), generated sequences may still be nonfunctional, unstable, insoluble, repetitive, biologically implausible, or unsuitable for a user’s intended application.
The DPO alignment objective uses predicted structural confidence and low-complexity filtering as proxy objectives. These proxies do not guarantee biological function, experimental success, safety, solubility, or manufacturability.
As with other generative protein models, ProtGPT3-10B-dpo may present dual-use risks if applied irresponsibly.
Citation
BibTeX:
bibtex
@article{protgpt3,title={ProtGPT3: an Open-source family of Promptable and Aligned Protein Language Models},author={Anonymous Authors},year={2026}}
More Information
For more info and guidance on how to generate sequences with ProtGPT3-10B-dpo check out the extensive description provided in ProtGPT3-1.3B, just replacing the model name (i.e., model_name=AI4PD/ProtGPT3-10B-dpo).
All models and code are released through the Hugging Face ecosystem and accompanying code repository.
Model provider
AI4PD
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information