pguerrero-igutierrez

Latxa-Qwen3-8B-Literary-v1-ca-eu

Model details

Developed by: Paula Guerrero and Iker Gutierrez
Affiliation: University of the Basque Country (EHU)
Model type: LoRA adapter for HiTZ/Latxa-Qwen3-VL-8B-Instruct
Languages: Catalan (ca), Basque (eu)
Domain: Literary translation
Base model: HiTZ/Latxa-Qwen3-VL-8B-Instruct
Repository: pguerrero-igutierrez/Latxa-Qwen3-8B-Literary-v1-ca-eu
Collection: pguerrero-igutierrez/mt-domain-adaptation-ca-eu

Sources

Hugging Face repository: https://huggingface.co/pguerrero-igutierrez/Latxa-Qwen3-8B-Literary-v1-ca-eu
Hugging Face collection: https://huggingface.co/collections/pguerrero-igutierrez/mt-domain-adaptation-ca-eu
Project repository: https://github.com/pguerrero-igutierrez/MT-domain-adaptation
Paper source: https://github.com/pguerrero-igutierrez/MT-domain-adaptation/tree/main/paper

Intended use

This model is intended for literary translation research in the Catalan-Basque pair, especially when no direct in-domain parallel corpus is available.

Supported prompting directions:

eu->ca: Itzuli testu hau euskaratik katalanera:\n\n{source}
ca->eu: Tradueix aquest text del català al basc:\n\n{source}

Out-of-scope use

Human publication without literary post-editing
Translation outside the literary register
High-stakes or professional workflows without review

Training data

The adapter was trained on two synthetic literary corpora:

backtranslated-corpus/ca-literary_trilingual.json
backtranslated-corpus/eu-literary-EhuHac.jsonl

The EU->CA direction uses synthetic Basque as source and original Catalan as target. The CA->EU direction uses synthetic Catalan as source and original Basque as target.

Training procedure

LoRA rank: 16
LoRA alpha: 32
LoRA dropout: 0.05
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization: 4-bit NF4
Max sequence length: 768
Epochs: 3
Batch size: 4
Gradient accumulation: 8
Learning rate: 5e-5

Evaluation

Results on the literary held-out test set:

Table with columns: Direction, chrF++, BLEU, TER, COMET
Direction	chrF++	BLEU	TER	COMET
`eu->ca`	36.13	8.96	85.08	69.61
`ca->eu`	26.90	2.44	99.60	65.29
Overall	31.34	6.12	91.91

In the project experiments, this direct literary SFT model slightly but consistently outperformed the continued-adaptation literary variant.

Limitations

Uses synthetic supervision rather than human-translated in-domain CA-EU literary parallel data
Literary quality is only partially reflected by overlap-based metrics
CA->EU remains the harder literary direction in the reported experiments

Usage

python
import torch
from peft import PeftModel
from transformers import AutoTokenizer, Qwen3VLForConditionalGeneration

base_id = "HiTZ/Latxa-Qwen3-VL-8B-Instruct"
adapter_id = "pguerrero-igutierrez/Latxa-Qwen3-8B-Literary-v1-ca-eu"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base_model = Qwen3VLForConditionalGeneration.from_pretrained(
    base_id,
    device_map="auto",
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, adapter_id)

prompt = "Tradueix aquest text del català al basc:\n\nLa nit era tranquil·la."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

bibtex
@misc{guerrero-gutierrez-2026-caeu-mt,
  title        = {Domain Adaptation for Catalan-Basque Machine Translation via Synthetic Data and Continued Fine-Tuning},
  author       = {Guerrero, Paula and Gutierrez, Iker},
  year         = {2026},
  note         = {Unpublished manuscript}
}

Contact

Paula Guerrero: pguerrero005@ikasle.ehu.eus
Iker Gutierrez: igutierrez134@ikasle.ehu.eus

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

pguerrero-igutierrez

Model Tree

Base

HiTZ/Latxa-Qwen3-VL-8B-Instruct

Adapter

this model

Input Modalities

Text

Image

Output Modalities

Text

Supported Functionality

Dedicated Endpoints

Container

Explore FriendliAI today

Get started Talk to an engineer

Model details

Developed by: Paula Guerrero and Iker Gutierrez
Affiliation: University of the Basque Country (EHU)
Model type: LoRA adapter for HiTZ/Latxa-Qwen3-VL-8B-Instruct
Languages: Catalan (ca), Basque (eu)
Domain: Literary translation
Base model: HiTZ/Latxa-Qwen3-VL-8B-Instruct
Repository: pguerrero-igutierrez/Latxa-Qwen3-8B-Literary-v1-ca-eu
Collection: pguerrero-igutierrez/mt-domain-adaptation-ca-eu

Sources

Hugging Face repository: https://huggingface.co/pguerrero-igutierrez/Latxa-Qwen3-8B-Literary-v1-ca-eu
Hugging Face collection: https://huggingface.co/collections/pguerrero-igutierrez/mt-domain-adaptation-ca-eu
Project repository: https://github.com/pguerrero-igutierrez/MT-domain-adaptation
Paper source: https://github.com/pguerrero-igutierrez/MT-domain-adaptation/tree/main/paper

Intended use

This model is intended for literary translation research in the Catalan-Basque pair, especially when no direct in-domain parallel corpus is available.

Supported prompting directions:

eu->ca: Itzuli testu hau euskaratik katalanera:\n\n{source}
ca->eu: Tradueix aquest text del català al basc:\n\n{source}

Out-of-scope use

Human publication without literary post-editing
Translation outside the literary register
High-stakes or professional workflows without review

Training data

The adapter was trained on two synthetic literary corpora:

backtranslated-corpus/ca-literary_trilingual.json
backtranslated-corpus/eu-literary-EhuHac.jsonl

The EU->CA direction uses synthetic Basque as source and original Catalan as target. The CA->EU direction uses synthetic Catalan as source and original Basque as target.

Training procedure

LoRA rank: 16
LoRA alpha: 32
LoRA dropout: 0.05
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization: 4-bit NF4
Max sequence length: 768
Epochs: 3
Batch size: 4
Gradient accumulation: 8
Learning rate: 5e-5

Evaluation

Results on the literary held-out test set:

Table with columns: Direction, chrF++, BLEU, TER, COMET
Direction	chrF++	BLEU	TER	COMET
`eu->ca`	36.13	8.96	85.08	69.61
`ca->eu`	26.90	2.44	99.60	65.29
Overall	31.34	6.12	91.91

In the project experiments, this direct literary SFT model slightly but consistently outperformed the continued-adaptation literary variant.

Limitations

Uses synthetic supervision rather than human-translated in-domain CA-EU literary parallel data
Literary quality is only partially reflected by overlap-based metrics
CA->EU remains the harder literary direction in the reported experiments

Usage

python
import torch
from peft import PeftModel
from transformers import AutoTokenizer, Qwen3VLForConditionalGeneration

base_id = "HiTZ/Latxa-Qwen3-VL-8B-Instruct"
adapter_id = "pguerrero-igutierrez/Latxa-Qwen3-8B-Literary-v1-ca-eu"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base_model = Qwen3VLForConditionalGeneration.from_pretrained(
    base_id,
    device_map="auto",
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, adapter_id)

prompt = "Tradueix aquest text del català al basc:\n\nLa nit era tranquil·la."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

bibtex
@misc{guerrero-gutierrez-2026-caeu-mt,
  title        = {Domain Adaptation for Catalan-Basque Machine Translation via Synthetic Data and Continued Fine-Tuning},
  author       = {Guerrero, Paula and Gutierrez, Iker},
  year         = {2026},
  note         = {Unpublished manuscript}
}

Contact

Paula Guerrero: pguerrero005@ikasle.ehu.eus
Iker Gutierrez: igutierrez134@ikasle.ehu.eus

Latxa-Qwen3-8B-Literary-v1-ca-eu

README

Model details

Sources

Intended use

Out-of-scope use

Training data

Training procedure

Evaluation

Limitations

Usage

Citation

Contact

Explore FriendliAI today

README

Model details

Sources

Intended use

Out-of-scope use

Training data

Training procedure

Evaluation

Limitations

Usage

Citation

Contact