Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Adapter Description
| Property | Value |
|---|---|
| Base Model | google/gemma-2-9b-it |
| Translation Direction | Hausa → English |
| LoRA Rank (r) | 64 |
| LoRA Alpha | 128 |
| Training Method | QLoRA (4-bit quantization) |
| Domain | Scientific/Academic texts |
Why LoRA?
LoRA (Low-Rank Adaptation) enables efficient fine-tuning by training only a small number of additional parameters. This adapter adds only ~32.0M parameters to the base model while achieving strong translation performance.
Evaluation Results
Performance on the AfriScience-MT test set:
| Split | BLEU | chrF | SSA-COMET |
|---|---|---|---|
| Test | - | - | - |
Metrics explanation:
- BLEU: Measures n-gram overlap with reference translations (0-100, higher is better)
- chrF: Character-level F-score, robust for morphologically rich languages (0-100, higher is better)
- SSA-COMET: Neural metric trained for Sub-Saharan African languages, shown as percentage (0-100, higher is better) (McGill-NLP/ssa-comet-stl)
Usage
Quick Start
python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfigfrom peft import PeftModelimport torch# Configure 4-bit quantization (recommended for memory efficiency)bnb_config = BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_compute_dtype=torch.bfloat16,bnb_4bit_quant_type="nf4",bnb_4bit_use_double_quant=True,)# Load base modelbase_model = AutoModelForCausalLM.from_pretrained("google/gemma-2-9b-it",quantization_config=bnb_config,device_map="auto",torch_dtype=torch.bfloat16,)tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")# Load LoRA adapteradapter_name = "dsfsi/gemma_2_9b_it-lora-r64-hau-eng"model = PeftModel.from_pretrained(base_model, adapter_name)model.eval()# Prepare translation promptsource_text = "Climate change significantly impacts agricultural productivity in sub-Saharan Africa."instruction = "Translate the following Hausa scientific text to English."# Format for Gemma chat templatemessages = [{"role": "user", "content": f"{instruction}\n\n{source_text}"}]prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)# Generate translationinputs = tokenizer(prompt, return_tensors="pt").to(model.device)with torch.no_grad():outputs = model.generate(**inputs,max_new_tokens=256,num_beams=5,early_stopping=True,pad_token_id=tokenizer.pad_token_id,)# Decode only the generated partgenerated = outputs[0][inputs["input_ids"].shape[1]:]translation = tokenizer.decode(generated, skip_special_tokens=True)print(translation)
Without Quantization (Full Precision)
python
# For GPUs with sufficient memory (>24GB for larger models)base_model = AutoModelForCausalLM.from_pretrained("google/gemma-2-9b-it",device_map="auto",torch_dtype=torch.bfloat16,)model = PeftModel.from_pretrained(base_model, "dsfsi/gemma_2_9b_it-lora-r64-hau-eng")
Training Details
Hyperparameters
| Parameter | Value |
|---|---|
| LoRA Rank (r) | 64 |
| LoRA Alpha | 128 |
| LoRA Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Epochs | 3 |
| Batch Size | 2 |
| Learning Rate | 2e-04 |
| Max Sequence Length | 512 |
| Gradient Accumulation | 4 |
Hardware Requirements
| Configuration | VRAM Required |
|---|---|
| 4-bit (QLoRA) | ~8-12 GB |
| 8-bit | ~16-20 GB |
| Full precision | ~24-40 GB |
Reproducibility
To reproduce this adapter:
bash
# Clone the AfriScience-MT repositorygit clone https://github.com/afriscience-mt/afriscience-mt.gitcd afriscience-mt# Install dependenciespip install -r requirements.txt# Run LoRA trainingpython -m afriscience_mt.scripts.run_lora_training \--data_dir ./data \--source_lang hau \--target_lang eng \--model_name google/gemma-2-9b-it \--model_type gemma \--lora_rank 64 \--output_dir ./output \--num_epochs 3 \--batch_size 4 \--load_in_4bit
Limitations
- Domain Specificity: Optimized for scientific/academic texts; may underperform on casual or colloquial language.
- Language Direction: Only supports Hausa → English translation.
- Base Model Required: Must be used with the google/gemma-2-9b-it base model.
- Context Length: Maximum context is model-dependent; longer texts should be chunked.
Citation
If you use this model, please cite the AfriScience-MT paper (arXiv:2605.29741):
bibtex
@article{abdulmumin2026afriscience,title = {AfriScience-MT: Towards Decolonizing Science in Africa through Text Translation},author = {Abdulmumin, Idris and Gwadabe, Tajuddeen and Muhammad, Shamsuddeen Hassan and Adelani, David Ifeoluwa and Khalo, Nomonde and Ahmad, Ibrahim Said and Modupe, Abiodun and Mumm, Anina and Biyela, Sibusiso and Rabie, Michelle and Havemann, Johanna and Rei, Marek and Abbott, Jade and Marivate, Vukosi},journal = {arXiv preprint arXiv:2605.29741},year = {2026},url = {https://arxiv.org/abs/2605.29741}}
License
This adapter is released under the Apache 2.0 License.
Acknowledgments
- Base model: google/gemma-2-9b-it
- LoRA implementation: PEFT
- Evaluation: SSA-COMET for African language assessment
Model provider
dsfsi
Model tree
Base
google/gemma-2-9b-it
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information