Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Model Description
The model is specialized in constrained generation for structured information extraction. Given a restaurant review (in English or Spanish), it enforces a strict JSON output matching a predefined schema of 11 valid aspect categories and 4 valid polarities, effectively bypassing the need for separate heuristic parsers.
Allowed Schema
-
Aspect Categories:
restaurant_general,food_quality,service,ambience,food_style_options,food_prices,restaurant_prices,drinks_quality,drinks_style_options,location,drinks_prices. -
Sentiment Polarities:
positive,negative,neutral,conflict. -
Developed by: Roger Baiges Trilla & Guillem Olivart Garrofé
-
Language(s): Bilingual
-
License: Apache-2.0
-
Finetuned from model: Qwen/Qwen3.5-2B
Experimental Results & Performance
The model was iteratively evaluated on the devel.json dataset across the four mandatory stages of the project. The QLoRA fine-tuning stage proved to be the most robust approach to combat dataset imbalance, yielding the highest scores across all macro and micro metrics.
Main Development-Set Results
| Configuration | Precision (Macro) | F1 (Macro) | F1 (Micro) |
|---|---|---|---|
| Empty baseline | 0.0% | 0.0% | 0.0% |
| Top 3 majority baseline | 60.4% | 55.6% | 56.2% |
| Best Zero-Shot Sweep | 74.6% | 67.9% | 68.3% |
| Best Few-Shot | 79.6% | 73.9% | 73.9% |
| Best Full LoRA Checkpoint | 86.4% | 84.4% | 86.4% |
| Best QLoRA Checkpoint (This Model) | 87.4% | 85.3% | 87.3% |
Key Takeaway: Fine-tuning permanently alters the decision boundaries of the small language model (SLM), providing a massive +17.4% F1-macro boost over the optimized zero-shot baseline and proving that 4-bit quantization did not degrade extraction capabilities.
Training Prompt Template
To guarantee perfect reproducibility and maintain the 85.3% F1-macro performance, this model requires a highly specific constraint-enforcement prompt.
The complete schema, guidelines, and instructions are explicitly stored in the repository within the absa_prompt.json file, separated into system and user roles to maximize Qwen's instruction-following capabilities.
How to Get Started (Inference Example)
This model is a PEFT adapter. The example below automatically fetches the official absa_prompt.json prompt from this Hugging Face repository at runtime, formats the text, and runs the inference:
python
from peft import AutoPeftModelForCausalLMfrom transformers import AutoTokenizerfrom huggingface_hub import hf_hub_downloadimport jsonimport torchrepo_id = "guillemolivart/qwen-2b-absa-qlora"# 1. Download and load the prompt template from the repository filesprompt_file_path = hf_hub_download(repo_id=repo_id, filename="absa_prompt.json")with open(prompt_file_path, "r", encoding="utf-8") as f:prompt_data = json.load(f)# 2. Load the model and tokenizer in quantized 4-bit for high efficiencymodel = AutoPeftModelForCausalLM.from_pretrained(repo_id,load_in_4bit=True,device_map="auto")tokenizer = AutoTokenizer.from_pretrained(repo_id)# 3. Define your custom multilingual reviewreview_text = "La comida buenísima, especialmente la carne, de precio correcto, pero los camareros tardaron una eternidad en llevarnos la cuenta."review_language = "es" # Supports "es" and "en"# Format the user part of the promptuser_content = prompt_data["user"].format(language=review_language, text=review_text)# Build the ChatML message array using the separated system and user rolesmessages = [{"role": "system", "content": prompt_data["system"]},{"role": "user", "content": user_content}]inputs = tokenizer.apply_chat_template(messages,tokenize=True,add_generation_prompt=True,return_tensors="pt").to("cuda")outputs = model.generate(inputs,max_new_tokens=256,temperature=0.1,do_sample=False)# 4. Decode response skipping the prompt system structuregenerated_tokens = outputs[0][len(inputs[0]):]response = tokenizer.decode(generated_tokens, skip_special_tokens=True)print(response)# Expected Output: {"food_quality": "positive", "food_prices": "neutral", "service": "negative"}
Model provider
guillemolivart
Model tree
Base
Qwen/Qwen3.5-2B
Adapter
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information