Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0The Problem
Financial institutions process thousands of customer complaints daily across mobile apps, websites, contact centres, email, and regulatory portals. These complaints arrive as free-form text and must be manually categorized before investigation and resolution can begin.
The result is predictable:
- Complaints are routed to the wrong operational teams
- Manual review effort increases
- Resolution times become longer
- Customer experience deteriorates
- Regulatory complaint handling becomes more expensive
Traditional classifiers typically predict one label at a time and struggle with the nuanced language used in consumer finance complaints.
This project addresses the problem as a structured generation task. A single model call extracts all required complaint taxonomy fields simultaneously.
What the Model Does
Given a customer complaint narrative, the model generates:
json
{"product": "Checking or savings account","sub_product": "Checking account","issue": "Unauthorized transactions or other transaction problem","sub_issue": "Debit card issue"}
These fields map directly to the CFPB complaint taxonomy and can be consumed by routing systems, workflow engines, complaint management platforms, and analytics pipelines.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-7B-Instruct |
| Fine-Tuning Method | LoRA (PEFT) |
| Training Hardware | AMD Instinct MI300X |
| Precision | bfloat16 |
| Task Type | Structured JSON Generation |
| Output Format | CFPB Taxonomy JSON |
Training Configuration
LoRA Adapter
| Parameter | Value |
|---|---|
| Rank (r) | 16 |
| Alpha | 32 |
| Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
Only LoRA adapter weights were updated during training.
Training Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 5 |
| Batch Size / Device | 8 |
| Gradient Accumulation | 4 |
| Effective Batch Size | 32 |
| Learning Rate | 1e-4 |
| Optimizer | AdamW |
| Scheduler | Linear |
| Precision | bf16 |
| Max Sequence Length | 1024 |
Training Convergence
| Step | Training Loss | Validation Loss |
|---|---|---|
| 100 | 1.7377 | 1.7092 |
| 200 | 1.6316 | 1.6485 |
| 300 | 1.6508 | 1.6295 |
| 400 | 1.6078 | 1.6204 |
| 500 | 1.6090 | 1.6145 |
| 600 | 1.6191 | 1.6101 |
| 700 | 1.5926 | 1.6058 |
| 800 | 1.6128 | 1.6034 |
| 900 | 1.6076 | 1.6012 |
| 1000 | 1.5874 | 1.5997 |
| 1100 | 1.6001 | 1.5984 |
Validation loss steadily decreased from 1.709 → 1.598, demonstrating successful adaptation of the base model to the CFPB complaint taxonomy.
Dataset
Source: CFPB Consumer Complaint Database
The model was trained to predict four operational complaint fields:
- Product
- Sub-Product
- Issue
- Sub-Issue
The task is formulated as structured JSON generation rather than independent classification.
Inference with Constrained Decoding
Inference uses a two-stage approach:
Stage 1
The fine-tuned model generates structured JSON.
Stage 2
Generated values are aligned to the nearest canonical CFPB label using TF-IDF similarity matching.
This improves robustness when the model generates labels that are semantically correct but differ slightly from official CFPB terminology.
Evaluation Results
Evaluated on 250 held-out CFPB complaints.
Baseline refers to the original Qwen2.5-7B-Instruct model without fine-tuning.
Product Classification Performance
| Metric | Baseline | Fine-Tuned | Improvement |
|---|---|---|---|
| Exact Match | 0.0100 | 0.9080 | +0.8980 |
| Precision | 0.5180 | 0.9082 | +0.3902 |
| Recall | 0.0100 | 0.9080 | +0.8980 |
| F1 Score | 0.0196 | 0.9068 | +0.8872 |
Sub-Product Semantic Similarity
| Metric | Baseline | Fine-Tuned | Improvement |
|---|---|---|---|
| ROUGE-1 | 0.0041 | 0.7122 | +0.7081 |
| ROUGE-2 | 0.0030 | 0.6452 | +0.6422 |
| ROUGE-L | 0.0041 | 0.7122 | +0.7081 |
| BLEU | 0.0000 | 0.5026 | +0.5026 |
Issue Semantic Similarity
| Metric | Baseline | Fine-Tuned | Improvement |
|---|---|---|---|
| ROUGE-1 | 0.0018 | 0.4018 | +0.4000 |
| ROUGE-2 | 0.0000 | 0.3463 | +0.3463 |
| ROUGE-L | 0.0018 | 0.4013 | +0.3995 |
| BLEU | 0.0000 | 0.3368 | +0.3368 |
Sub-Issue Semantic Similarity
| Metric | Baseline | Fine-Tuned | Improvement |
|---|---|---|---|
| ROUGE-1 | 0.0004 | 0.5215 | +0.5211 |
| ROUGE-2 | 0.0000 | 0.4895 | +0.4895 |
| ROUGE-L | 0.0004 | 0.5207 | +0.5203 |
| BLEU | 0.0000 | 0.2283 | +0.2283 |
Final Results Summary
| Category | Base Qwen2.5-7B | Fine-Tuned CFPB Model |
|---|---|---|
| Product Classification (Exact Match) | 1.0% | 90.8% |
| Product F1 Score | 1.96% | 90.7% |
| Sub-Product ROUGE-L | 0.004 | 0.712 |
| Issue ROUGE-L | 0.002 | 0.401 |
| Sub-Issue ROUGE-L | 0.000 | 0.521 |
| Output Structure | Inconsistent | Reliable CFPB JSON |
| Taxonomy Alignment | Poor | High |
| Training Time | ~45 Minutes | ~45 Minutes |
| Inference Latency | Baseline | Near Identical |
| Additional GPU Memory | Baseline | ~50 MB Adapter |
Run inference
python
def categorise_complaint(complaint_text: str, model, tokenizer) -> dict:messages = [{"role": "system","content": ("You are a banking complaint classification assistant. ""Given a consumer complaint narrative, extract the CFPB ticket fields ""as a JSON object with keys: product, sub_product, issue, sub_issue."),},{"role": "user","content": complaint_text,},]prompt = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True,)inputs = tokenizer(prompt, return_tensors="pt").to(model.device)with torch.no_grad():output = model.generate(**inputs,max_new_tokens=128,do_sample=False,pad_token_id=tokenizer.eos_token_id,)prompt_len = inputs["input_ids"].shape[1]generated = tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)return generatedcomplaint = """I reported fraudulent transactions on my debit card and the bank reversedmy provisional credit without explaining the investigation outcome."""result = categorise_complaint(complaint, model, tokenizer)print(result)# {"product": "Checking or savings account", "sub_product": "Checking account",# "issue": "Unauthorized transactions or other transaction problem",# "sub_issue": "Debit card issue"}
Dependencies
markdown
transformers==4.44.0peft==0.12.0accelerate==0.34.0datasets==2.21.0torch (ROCm-compatible build for AMD, or standard CUDA build)scikit-learnrouge-scoresacrebleunltk
Limitations
- CFPB taxonomy only. The model is trained on and constrained to CFPB Consumer Complaint Database labels. It is not a general-purpose complaint classifier and should not be used with complaint taxonomies from other regulatory bodies or internal systems without retraining.
- Issue field accuracy. The
issuefield (33.6% accuracy) is the weakest link. The CFPB issue taxonomy contains 80+ canonical strings with overlapping phrasing. Expanding training data and further tuning the constrained decoder are the most direct paths to improvement. - English language only. All training data is in English. Performance on non-English complaints is untested and likely poor.
- Context length. Complaints longer than 1024 tokens will be truncated. Most CFPB complaints are well within this limit, but very long narratives may lose relevant context.
Intended Use
This model is intended for use by:
- Banking operations teams automating first-touch complaint categorisation
- Compliance teams processing regulatory complaint filings
- Contact centre platforms routing incoming complaints before agent assignment
- Research teams studying LLM adaptation for financial NLP tasks
It is not intended for consumer-facing deployment without human review of outputs, or for use in jurisdictions where automated complaint classification decisions have legal or regulatory implications without appropriate oversight.
Training Infrastructure
Trained on an AMD Instinct MI300X GPU (192 GB HBM3 VRAM) running ROCm 7.2.4. The training stack is fully ROCm-native — bitsandbytes (CUDA-only) is not used. Model precision is bfloat16, which is the native compute type for the CDNA3 architecture.
Citation
If you use this model in research or production, please cite the CFPB Consumer Complaint Database as the data source:
markdown
Consumer Financial Protection Bureau (CFPB)Consumer Complaint Databasehttps://www.consumerfinance.gov/data-research/consumer-complaints/
Model provider
aryachakraborty
Model tree
Base
Qwen/Qwen2.5-7B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information