dsfsi/gemma_2_9b_it-lora-r4-zul-eng API & Inference Endpoint

Adapter Description

Table with columns: Property, Value
Property	Value
Base Model	google/gemma-2-9b-it
Translation Direction	isiZulu → English
LoRA Rank (r)	4
LoRA Alpha	8
Training Method	QLoRA (4-bit quantization)
Domain	Scientific/Academic texts

Why LoRA?

LoRA (Low-Rank Adaptation) enables efficient fine-tuning by training only a small number of additional parameters. This adapter adds only ~2.0M parameters to the base model while achieving strong translation performance.

Evaluation Results

Performance on the AfriScience-MT test set:

Table with columns: Split, BLEU, chrF, SSA-COMET
Split	BLEU	chrF	SSA-COMET
Test	-	-	-

Metrics explanation:

BLEU: Measures n-gram overlap with reference translations (0-100, higher is better)
chrF: Character-level F-score, robust for morphologically rich languages (0-100, higher is better)
SSA-COMET: Neural metric trained for Sub-Saharan African languages, shown as percentage (0-100, higher is better) (McGill-NLP/ssa-comet-stl)

Usage

Quick Start

python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

# Configure 4-bit quantization (recommended for memory efficiency)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    quantization_config=bnb_config,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")

# Load LoRA adapter
adapter_name = "dsfsi/gemma_2_9b_it-lora-r4-zul-eng"
model = PeftModel.from_pretrained(base_model, adapter_name)
model.eval()

# Prepare translation prompt
source_text = "Climate change significantly impacts agricultural productivity in sub-Saharan Africa."
instruction = "Translate the following isiZulu scientific text to English."

# Format for Gemma chat template
messages = [{"role": "user", "content": f"{instruction}\n\n{source_text}"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Generate translation
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        num_beams=5,
        early_stopping=True,
        pad_token_id=tokenizer.pad_token_id,
    )

# Decode only the generated part
generated = outputs[0][inputs["input_ids"].shape[1]:]
translation = tokenizer.decode(generated, skip_special_tokens=True)
print(translation)

Without Quantization (Full Precision)

python
# For GPUs with sufficient memory (>24GB for larger models)
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)
model = PeftModel.from_pretrained(base_model, "dsfsi/gemma_2_9b_it-lora-r4-zul-eng")

Training Details

Hyperparameters

Table with columns: Parameter, Value
Parameter	Value
LoRA Rank (r)	4
LoRA Alpha	8
LoRA Dropout	0.05
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs	3
Batch Size	2
Learning Rate	2e-04
Max Sequence Length	512
Gradient Accumulation	4

Hardware Requirements

Table with columns: Configuration, VRAM Required
Configuration	VRAM Required
4-bit (QLoRA)	~8-12 GB
8-bit	~16-20 GB
Full precision	~24-40 GB

Reproducibility

To reproduce this adapter:

bash
# Clone the AfriScience-MT repository
git clone https://github.com/afriscience-mt/afriscience-mt.git
cd afriscience-mt

# Install dependencies
pip install -r requirements.txt

# Run LoRA training
python -m afriscience_mt.scripts.run_lora_training \
    --data_dir ./data \
    --source_lang zul \
    --target_lang eng \
    --model_name google/gemma-2-9b-it \
    --model_type gemma \
    --lora_rank 4 \
    --output_dir ./output \
    --num_epochs 3 \
    --batch_size 4 \
    --load_in_4bit

Limitations

Domain Specificity: Optimized for scientific/academic texts; may underperform on casual or colloquial language.
Language Direction: Only supports isiZulu → English translation.
Base Model Required: Must be used with the google/gemma-2-9b-it base model.
Context Length: Maximum context is model-dependent; longer texts should be chunked.

Citation

If you use this model, please cite the AfriScience-MT paper (arXiv:2605.29741):

bibtex
@article{abdulmumin2026afriscience,
  title   = {AfriScience-MT: Towards Decolonizing Science in Africa through Text Translation},
  author  = {Abdulmumin, Idris and Gwadabe, Tajuddeen and Muhammad, Shamsuddeen Hassan and Adelani, David Ifeoluwa and Khalo, Nomonde and Ahmad, Ibrahim Said and Modupe, Abiodun and Mumm, Anina and Biyela, Sibusiso and Rabie, Michelle and Havemann, Johanna and Rei, Marek and Abbott, Jade and Marivate, Vukosi},
  journal = {arXiv preprint arXiv:2605.29741},
  year    = {2026},
  url     = {https://arxiv.org/abs/2605.29741}
}

License

This adapter is released under the Apache 2.0 License.

Acknowledgments

Base model: google/gemma-2-9b-it
LoRA implementation: PEFT
Evaluation: SSA-COMET for African language assessment

Property

Value

Base Model

google/gemma-2-9b-it

Translation Direction

isiZulu → English

LoRA Rank (r)

LoRA Alpha

Training Method

QLoRA (4-bit quantization)

Domain

Scientific/Academic texts

Split

BLEU

chrF

SSA-COMET

Test

python

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

# Configure 4-bit quantization (recommended for memory efficiency)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    quantization_config=bnb_config,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")

# Load LoRA adapter
adapter_name = "dsfsi/gemma_2_9b_it-lora-r4-zul-eng"
model = PeftModel.from_pretrained(base_model, adapter_name)
model.eval()

# Prepare translation prompt
source_text = "Climate change significantly impacts agricultural productivity in sub-Saharan Africa."
instruction = "Translate the following isiZulu scientific text to English."

# Format for Gemma chat template
messages = [{"role": "user", "content": f"{instruction}\n\n{source_text}"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Generate translation
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        num_beams=5,
        early_stopping=True,
        pad_token_id=tokenizer.pad_token_id,
    )

# Decode only the generated part
generated = outputs[0][inputs["input_ids"].shape[1]:]
translation = tokenizer.decode(generated, skip_special_tokens=True)
print(translation)

python

# For GPUs with sufficient memory (>24GB for larger models)
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)
model = PeftModel.from_pretrained(base_model, "dsfsi/gemma_2_9b_it-lora-r4-zul-eng")

Parameter

Value

LoRA Rank (r)

LoRA Alpha

LoRA Dropout

0.05

Target Modules

q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Epochs

Batch Size

Learning Rate

2e-04

Max Sequence Length

512

Gradient Accumulation

Configuration

VRAM Required

4-bit (QLoRA)

~8-12 GB

8-bit

~16-20 GB

Full precision

~24-40 GB

bash

# Clone the AfriScience-MT repository
git clone https://github.com/afriscience-mt/afriscience-mt.git
cd afriscience-mt

# Install dependencies
pip install -r requirements.txt

# Run LoRA training
python -m afriscience_mt.scripts.run_lora_training \
    --data_dir ./data \
    --source_lang zul \
    --target_lang eng \
    --model_name google/gemma-2-9b-it \
    --model_type gemma \
    --lora_rank 4 \
    --output_dir ./output \
    --num_epochs 3 \
    --batch_size 4 \
    --load_in_4bit

bibtex

@article{abdulmumin2026afriscience,
  title   = {AfriScience-MT: Towards Decolonizing Science in Africa through Text Translation},
  author  = {Abdulmumin, Idris and Gwadabe, Tajuddeen and Muhammad, Shamsuddeen Hassan and Adelani, David Ifeoluwa and Khalo, Nomonde and Ahmad, Ibrahim Said and Modupe, Abiodun and Mumm, Anina and Biyela, Sibusiso and Rabie, Michelle and Havemann, Johanna and Rei, Marek and Abbott, Jade and Marivate, Vukosi},
  journal = {arXiv preprint arXiv:2605.29741},
  year    = {2026},
  url     = {https://arxiv.org/abs/2605.29741}
}

gemma_2_9b_it-lora-r4-zul-eng

Get help setting up a custom Dedicated Endpoints.

README

Adapter Description

Why LoRA?

Evaluation Results

Usage

Quick Start

Without Quantization (Full Precision)

Training Details

Hyperparameters

Hardware Requirements

Reproducibility

Limitations

Citation

License

Acknowledgments

Explore FriendliAI today

README

Adapter Description

Why LoRA?

Evaluation Results

Usage

Quick Start

Without Quantization (Full Precision)

Training Details

Hyperparameters

Hardware Requirements

Reproducibility

Limitations

Citation

License

Acknowledgments