Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: mitModel Description
QLoRA adapter (LoRA fine-tune on 4-bit quantized base model) for multi-label intent classification of Russian client messages.
- Base model: Qwen/Qwen3-8B
- Adapter type: LoRA (via PEFT)
- Quantization: 4-bit NF4 (QLoRA) with double quantization, compute dtype bfloat16
- Task: Multi-label sequence classification
- Language: Russian
- Number of labels: 7
Usage
python
import torchfrom transformers import AutoTokenizer, AutoModelForSequenceClassification, BitsAndBytesConfigfrom peft import PeftModel, PeftConfigpath = 'AIPsy/qwen3-8b-client-intent-classification-ru-7'config = PeftConfig.from_pretrained(path)# tokenizertokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)tokenizer.pad_token = tokenizer.eos_token# qunatization configquantization_config = BitsAndBytesConfig(load_in_4bit = True,bnb_4bit_quant_type = 'nf4',bnb_4bit_use_double_quant = True,bnb_4bit_compute_dtype = torch.bfloat16)model_client = AutoModelForSequenceClassification.from_pretrained(config.base_model_name_or_path,device_map="cuda:0",quantization_config=quantization_config,num_labels=7,use_cache = True,)model_client = PeftModel.from_pretrained(model_client, path)model_client.config.pad_token_id = tokenizer.pad_token_idtext = '''Терапевт: Я действительно очень мало знаю о том, почему вы пришли. Не могли бы вы рассказать мне кое-что об этом?Клиент: Это долгая история. Я не могу найти себя. Все, что я делаю, кажется ошибочным. Если есть какая-то критика или кто-то говорит что-либо обо мне, я просто не могу принять это. Когда у меня была работа, если кто-нибудь сказал что-то критическое, это просто разбило меня."Терапевт: Вы чувствуете, что все идет не так, и вы подавлены критикой.'''inputs = tokenizer(str(text), padding=True, truncation=True, return_tensors="pt")with torch.no_grad():outputs = model_client(**inputs.to(model_client.device))binary_tensor = (outputs.logits > 0).int()print(binary_tensor)client_categories_list = ['Информирование', 'Запрос информации','Ведение диалога','Одобрение', 'Неодобрение', 'Рефлексия','Решение проблемы']list_categories = []for label,category in zip(binary_tensor, client_categories_list):if label:list_categories.append(category)print(list_categories)#[1, 1, 0, 0, 1, 1, 0]list_categories = [catfor label, cat in zip(binary_tensor.squeeze(), client_categories_list)if label.item()]print(list_categories)#['Информирование', 'Запрос информации', 'Неодобрение', 'Рефлексия']
Dataset
The source material was the recordings of psychotherapeutic sessions posted on YouTube in the public domain. After conducting speaker diarization and transcription of the recordings 1,934 client utterances were annotated by six experts working in two teams of three. Annotation was performed using detailed guidelines comprising the intention taxonomy, definitions of each category, and illustrative examples.
Training Details
- Quantization: QLoRA (NF4, double quant, bfloat16)
- Framework: PEFT + BitsAndBytes + Transformers
Metrics
F1 score metrics for test sample across categories
| Intentions | Precision | Recall | F1-score |
|---|---|---|---|
| 1. Providing Information | 0.93 | 0.98 | 0.95 |
| 2. Request for Information | 0.80 | 0.64 | 0.71 |
| 3. Maintaining Dialogue | 0.75 | 0.84 | 0.79 |
| 4. Approval | 0.76 | 0.75 | 0.75 |
| 5. Disapproval | 0.81 | 0.63 | 0.71 |
| 6. Reflection | 0.86 | 0.81 | 0.83 |
| 7. Problem-Solving | 0.87 | 0.76 | 0.81 |
| F1-weighted | 0.87 | 0.84 | 0.85 |
| F1-Macro | 0.8 | ||
| F1-Micro | 0.86 |
Model provider
AIPsy
Model tree
Base
Qwen/Qwen3-8B
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information