Model details
- Developed by: Paula Guerrero and Iker Gutierrez
- Affiliation: University of the Basque Country (EHU)
- Model type: LoRA adapter for
HiTZ/Latxa-Qwen3-VL-8B-Instruct
- Languages: Catalan (
ca), Basque (eu)
- Domain: Clinical translation
- Direction:
ca->eu only
- Base model:
HiTZ/Latxa-Qwen3-VL-8B-Instruct
- Continued from:
pguerrero-igutierrez/Latxa-Qwen3-8B-General-eu-ca
- Repository:
pguerrero-igutierrez/Latxa-Qwen3-8B-Clinical-v2-ca-eu
- Collection:
pguerrero-igutierrez/mt-domain-adaptation-ca-eu
Sources
Intended use
This model is intended for research on continued domain adaptation for low-resource clinical Catalan-Basque translation.
Supported prompting direction:
ca->eu: Tradueix aquest text clínic del català al basc:\n\n{source}
Out-of-scope use
- Medical decision-making
- Clinical deployment without expert review
- Any reverse direction (
eu->ca)
- Translation outside the clinical domain
Training data
The adapter uses the same back-translated clinical corpus as clinicalv1:
backtranslated-corpus/eu-clinical_backtranslated.json
Synthetic Catalan is used as source and original Basque as target.
Training procedure
- LoRA rank: 16
- LoRA alpha: 32
- LoRA dropout: 0.05
- Target modules:
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Quantization: 4-bit NF4
- Max sequence length: 768
- Epochs: 3
- Batch size: 4
- Gradient accumulation: 8
- Learning rate:
5e-5
Evaluation
Results on the clinical held-out test set:
Table with columns: Direction, chrF++, BLEU, TER, COMET| Direction | chrF++ | BLEU | TER | COMET |
|---|
ca->eu | 38.73 | 18.50 | 104.64 | 75.02 |
In the project experiments, this continued-adaptation model performed slightly below the direct clinical SFT model (clinicalv1) across the reported clinical metrics.
Limitations
- Only supports
ca->eu
- Trained on synthetic-source data
- Evaluation is automatic only; expert medical review remains necessary
- Must not be used for diagnosis or patient care without human oversight
Usage
import torch
from peft import PeftModel
from transformers import AutoTokenizer, Qwen3VLForConditionalGeneration
base_id = "HiTZ/Latxa-Qwen3-VL-8B-Instruct"
adapter_id = "pguerrero-igutierrez/Latxa-Qwen3-8B-Clinical-v2-ca-eu"
tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base_model = Qwen3VLForConditionalGeneration.from_pretrained(
base_id,
device_map="auto",
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, adapter_id)
prompt = "Tradueix aquest text clínic del català al basc:\n\nEl pacient presenta febre alta."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Citation
@misc{guerrero-gutierrez-2026-caeu-mt,
title = {Domain Adaptation for Catalan-Basque Machine Translation via Synthetic Data and Continued Fine-Tuning},
author = {Guerrero, Paula and Gutierrez, Iker},
year = {2026},
note = {Unpublished manuscript}
}
- Paula Guerrero:
pguerrero005@ikasle.ehu.eus
- Iker Gutierrez:
igutierrez134@ikasle.ehu.eus