Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Model details
- Developed by: Paula Guerrero and Iker Gutierrez
- Affiliation: University of the Basque Country (EHU)
- Model type: LoRA adapter for
HiTZ/Latxa-Qwen3-VL-8B-Instruct - Languages: Catalan (
ca), Basque (eu) - Domain: Literary translation
- Base model:
HiTZ/Latxa-Qwen3-VL-8B-Instruct - Repository:
pguerrero-igutierrez/Latxa-Qwen3-8B-Literary-v1-ca-eu - Collection:
pguerrero-igutierrez/mt-domain-adaptation-ca-eu
Sources
- Hugging Face repository: https://huggingface.co/pguerrero-igutierrez/Latxa-Qwen3-8B-Literary-v1-ca-eu
- Hugging Face collection: https://huggingface.co/collections/pguerrero-igutierrez/mt-domain-adaptation-ca-eu
- Project repository: https://github.com/pguerrero-igutierrez/MT-domain-adaptation
- Paper source: https://github.com/pguerrero-igutierrez/MT-domain-adaptation/tree/main/paper
Intended use
This model is intended for literary translation research in the Catalan-Basque pair, especially when no direct in-domain parallel corpus is available.
Supported prompting directions:
eu->ca:Itzuli testu hau euskaratik katalanera:\n\n{source}ca->eu:Tradueix aquest text del català al basc:\n\n{source}
Out-of-scope use
- Human publication without literary post-editing
- Translation outside the literary register
- High-stakes or professional workflows without review
Training data
The adapter was trained on two synthetic literary corpora:
backtranslated-corpus/ca-literary_trilingual.jsonbacktranslated-corpus/eu-literary-EhuHac.jsonl
The EU->CA direction uses synthetic Basque as source and original Catalan as target. The CA->EU direction uses synthetic Catalan as source and original Basque as target.
Training procedure
- LoRA rank: 16
- LoRA alpha: 32
- LoRA dropout: 0.05
- Target modules:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj - Quantization: 4-bit NF4
- Max sequence length: 768
- Epochs: 3
- Batch size: 4
- Gradient accumulation: 8
- Learning rate:
5e-5 - Scheduler: cosine
- Warmup ratio: 0.05
- Seed: 42
- Checkpoint selection: best validation BLEU
Evaluation
Results on the literary held-out test set:
| Direction | chrF++ | BLEU | TER | COMET |
|---|---|---|---|---|
eu->ca | 36.13 | 8.96 | 85.08 | 69.61 |
ca->eu | 26.90 | 2.44 | 99.60 | 65.29 |
| Overall | 31.34 | 6.12 | 91.91 | 66.37 |
In the project experiments, this direct literary SFT model slightly but consistently outperformed the continued-adaptation literary variant.
Limitations
- Uses synthetic supervision rather than human-translated in-domain CA-EU literary parallel data
- Literary quality is only partially reflected by overlap-based metrics
- CA->EU remains the harder literary direction in the reported experiments
Usage
python
import torchfrom peft import PeftModelfrom transformers import AutoTokenizer, Qwen3VLForConditionalGenerationbase_id = "HiTZ/Latxa-Qwen3-VL-8B-Instruct"adapter_id = "pguerrero-igutierrez/Latxa-Qwen3-8B-Literary-v1-ca-eu"tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)base_model = Qwen3VLForConditionalGeneration.from_pretrained(base_id,device_map="auto",torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,trust_remote_code=True,)model = PeftModel.from_pretrained(base_model, adapter_id)prompt = "Tradueix aquest text del català al basc:\n\nLa nit era tranquil·la."inputs = tokenizer(prompt, return_tensors="pt").to(model.device)outputs = model.generate(**inputs, max_new_tokens=128)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Citation
bibtex
@misc{guerrero-gutierrez-2026-caeu-mt,title = {Domain Adaptation for Catalan-Basque Machine Translation via Synthetic Data and Continued Fine-Tuning},author = {Guerrero, Paula and Gutierrez, Iker},year = {2026},note = {Unpublished manuscript}}
Contact
- Paula Guerrero:
pguerrero005@ikasle.ehu.eus - Iker Gutierrez:
igutierrez134@ikasle.ehu.eus
Model provider
pguerrero-igutierrez
Model tree
Base
HiTZ/Latxa-Qwen3-VL-8B-Instruct
Adapter
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information