Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

Model Description

The model is specialized in constrained generation for structured information extraction. Given a restaurant review (in English or Spanish), it enforces a strict JSON output matching a predefined schema of 11 valid aspect categories and 4 valid polarities, effectively bypassing the need for separate heuristic parsers.

Allowed Schema

  • Aspect Categories: restaurant_general, food_quality, service, ambience, food_style_options, food_prices, restaurant_prices, drinks_quality, drinks_style_options, location, drinks_prices.

  • Sentiment Polarities: positive, negative, neutral, conflict.

  • Developed by: Roger Baiges Trilla & Guillem Olivart Garrofé

  • Language(s): Bilingual

  • License: Apache-2.0

  • Finetuned from model: Qwen/Qwen3.5-2B


Experimental Results & Performance

The model was iteratively evaluated on the devel.json dataset across the four mandatory stages of the project. The QLoRA fine-tuning stage proved to be the most robust approach to combat dataset imbalance, yielding the highest scores across all macro and micro metrics.

Main Development-Set Results

ConfigurationPrecision (Macro)F1 (Macro)F1 (Micro)
Empty baseline0.0%0.0%0.0%
Top 3 majority baseline60.4%55.6%56.2%
Best Zero-Shot Sweep74.6%67.9%68.3%
Best Few-Shot79.6%73.9%73.9%
Best Full LoRA Checkpoint86.4%84.4%86.4%
Best QLoRA Checkpoint (This Model)87.4%85.3%87.3%

Key Takeaway: Fine-tuning permanently alters the decision boundaries of the small language model (SLM), providing a massive +17.4% F1-macro boost over the optimized zero-shot baseline and proving that 4-bit quantization did not degrade extraction capabilities.


Training Prompt Template

To guarantee perfect reproducibility and maintain the 85.3% F1-macro performance, this model requires a highly specific constraint-enforcement prompt.

The complete schema, guidelines, and instructions are explicitly stored in the repository within the absa_prompt.json file, separated into system and user roles to maximize Qwen's instruction-following capabilities.

How to Get Started (Inference Example)

This model is a PEFT adapter. The example below automatically fetches the official absa_prompt.json prompt from this Hugging Face repository at runtime, formats the text, and runs the inference:

python

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
import json
import torch
repo_id = "guillemolivart/qwen-2b-absa-qlora"
# 1. Download and load the prompt template from the repository files
prompt_file_path = hf_hub_download(repo_id=repo_id, filename="absa_prompt.json")
with open(prompt_file_path, "r", encoding="utf-8") as f:
prompt_data = json.load(f)
# 2. Load the model and tokenizer in quantized 4-bit for high efficiency
model = AutoPeftModelForCausalLM.from_pretrained(
repo_id,
load_in_4bit=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
# 3. Define your custom multilingual review
review_text = "La comida buenísima, especialmente la carne, de precio correcto, pero los camareros tardaron una eternidad en llevarnos la cuenta."
review_language = "es" # Supports "es" and "en"
# Format the user part of the prompt
user_content = prompt_data["user"].format(language=review_language, text=review_text)
# Build the ChatML message array using the separated system and user roles
messages = [
{"role": "system", "content": prompt_data["system"]},
{"role": "user", "content": user_content}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to("cuda")
outputs = model.generate(
inputs,
max_new_tokens=256,
temperature=0.1,
do_sample=False
)
# 4. Decode response skipping the prompt system structure
generated_tokens = outputs[0][len(inputs[0]):]
response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
print(response)
# Expected Output: {"food_quality": "positive", "food_prices": "neutral", "service": "negative"}

Model provider

guillemolivart

Model tree

Base

Qwen/Qwen3.5-2B

Adapter

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today