📋 Table of Contents
⚙️ Model Details
Table with columns: Property, Value| Property | Value |
|---|
| Base Model | meta-llama/Meta-Llama-3-8B (bnb-4bit quantized) |
| Fine-tuning Method | LoRA (Parameter-Efficient Fine-Tuning) |
| Framework | Unsloth + Hugging Face Transformers |
| Quantization | 4-bit (bitsandbytes) |
| Domain | Drug Discovery / Computational Biology |
| Task | Drug–Target Interaction (DTI) reasoning |
| License | Apache 2.0 |
🔬 Capabilities
DrDTI-Reasoner is trained to handle the following tasks:
- DTI Classification — Predict whether a drug interacts with a given protein target
- Bioactivity Estimation — Classify compounds as Active or Inactive
- Potency Approximation — Estimate pXC50 values where applicable
- Molecular Interpretation — Parse and reason over SMILES molecular representations
- Target Analysis — Interpret protein targets from UniProt IDs or gene names
- Assay-Aware Reasoning — Incorporate assay metadata (mechanism, technology, mode) into predictions
- Biological Explanation — Generate short, human-readable justifications for predictions
The model expects structured biomedical input in the following format:
Drug (SMILES): <SMILES string>
Target protein: <UniProt ID or protein name>
Assay mechanism: <optional>
Assay technology: <optional>
Note: Input formatting quality directly affects prediction accuracy. Always validate SMILES strings and use standard UniProt identifiers when possible.
Drug (SMILES): CC1=CC=C(C=C1)S(=O)(=O)N
Target protein: P00533 (EGFR)
Assay mechanism: Inhibition
Assay technology: Biochemical
The model returns structured predictions:
Active: true | false
pXC50: <numeric value> | null
Reason: <short biological explanation of the predicted interaction>
Example Output
Active: true
pXC50: 7.4
Reason: The compound's sulfonamide group forms key hydrogen bonds with the EGFR
ATP-binding pocket, suggesting moderate inhibitory activity consistent
with the predicted pXC50.
📊 Training Data
DrDTI-Reasoner was trained on curated drug–target interaction datasets containing:
- Molecular structures in SMILES format
- Protein target identifiers (UniProt IDs and gene names)
- Bioassay metadata (mechanism, technology, and mode)
- Binary activity labels (active / inactive)
- Potency values (pXC50) where available
Key sources include ChEMBL and BindingDB bioactivity databases.
📈 Evaluation Results
Evaluated on a held-out test split from the ChEMBL bioactivity dataset:
Table with columns: Metric, Value| Metric | Value |
|---|
| Accuracy | 0.84 |
| F1 Score | 0.82 |
| ROC-AUC | 0.89 |
⚠️ These are preliminary research metrics. Results may vary across target classes, assay types, and chemical scaffolds not well-represented in the training data. Scaffold-split benchmarking is planned for a future release.
⚠️ Limitations
Please read carefully before use:
- Research only — This is not a clinical or regulatory-grade system
- Probabilistic outputs — Predictions are model-generated estimates, not experimentally verified results
- Statistical SMILES understanding — Molecular interpretation is learned from data, not from physical simulation or quantum chemistry
- Format-sensitive — Prediction quality degrades with poorly formatted or non-standard inputs
- Not medically validated — This model has not been assessed for safety or efficacy in any clinical context
🚀 Intended Use
DrDTI-Reasoner is designed for:
✅ Computational drug discovery research
✅ Bioactivity prediction experiments
✅ Machine learning benchmarking in cheminformatics
✅ Educational use in bioinformatics and AI
✅ Early-stage hit identification and prioritization workflows
Not intended for:
❌ Clinical diagnosis or treatment decisions
❌ Regulatory or pharmaceutical approval processes
❌ Any application directly impacting human health
🗺️ Roadmap
📦 Base Model & Framework
Disclaimer: DrDTI-Reasoner is a research tool. All predictions should be interpreted in the context of existing literature and validated experimentally before any downstream use.