aryachakraborty/arya-cfpb-qwen_2.5-7b-lora-V2 API & Inference Endpoint

The Problem

Financial institutions process thousands of customer complaints daily across mobile apps, websites, contact centres, email, and regulatory portals. These complaints arrive as free-form text and must be manually categorized before investigation and resolution can begin.

The result is predictable:

Complaints are routed to the wrong operational teams
Manual review effort increases
Resolution times become longer
Customer experience deteriorates
Regulatory complaint handling becomes more expensive

Traditional classifiers typically predict one label at a time and struggle with the nuanced language used in consumer finance complaints.

This project addresses the problem as a structured generation task. A single model call extracts all required complaint taxonomy fields simultaneously.

What the Model Does

Given a customer complaint narrative, the model generates:

json
{
  "product": "Checking or savings account",
  "sub_product": "Checking account",
  "issue": "Unauthorized transactions or other transaction problem",
  "sub_issue": "Debit card issue"
}

These fields map directly to the CFPB complaint taxonomy and can be consumed by routing systems, workflow engines, complaint management platforms, and analytics pipelines.

Model Details

Property	Value
Base Model	Qwen/Qwen2.5-7B-Instruct
Fine-Tuning Method	LoRA (PEFT)
Training Hardware	AMD Instinct MI300X
Precision	bfloat16
Task Type	Structured JSON Generation
Output Format	CFPB Taxonomy JSON

Training Configuration

LoRA Adapter

Parameter	Value
Rank (r)	16
Alpha	32
Dropout	0.05
Target Modules	q_proj, k_proj, v_proj, o_proj

Only LoRA adapter weights were updated during training.

Training Hyperparameters

Parameter	Value
Epochs	5
Batch Size / Device	8
Gradient Accumulation	4
Effective Batch Size	32
Learning Rate	1e-4
Optimizer	AdamW
Scheduler	Linear
Precision	bf16
Max Sequence Length	1024

Training Convergence

Step	Training Loss	Validation Loss
100	1.7377	1.7092
200	1.6316	1.6485
300	1.6508	1.6295
400	1.6078	1.6204
500	1.6090	1.6145
600	1.6191	1.6101
700	1.5926	1.6058
800	1.6128	1.6034
900	1.6076	1.6012
1000	1.5874	1.5997
1100	1.6001	1.5984

Validation loss steadily decreased from 1.709 → 1.598, demonstrating successful adaptation of the base model to the CFPB complaint taxonomy.

Dataset

Source: CFPB Consumer Complaint Database

The model was trained to predict four operational complaint fields:

Product
Sub-Product
Issue
Sub-Issue

The task is formulated as structured JSON generation rather than independent classification.

Inference with Constrained Decoding

Inference uses a two-stage approach:

Stage 1

The fine-tuned model generates structured JSON.

Stage 2

Generated values are aligned to the nearest canonical CFPB label using TF-IDF similarity matching.

This improves robustness when the model generates labels that are semantically correct but differ slightly from official CFPB terminology.

Evaluation Results

Evaluated on 250 held-out CFPB complaints.

Baseline refers to the original Qwen2.5-7B-Instruct model without fine-tuning.

Product Classification Performance

Metric	Baseline	Fine-Tuned	Improvement
Exact Match	0.0100	0.9080	+0.8980
Precision	0.5180	0.9082	+0.3902
Recall	0.0100	0.9080	+0.8980
F1 Score	0.0196	0.9068	+0.8872

Sub-Product Semantic Similarity

Metric	Baseline	Fine-Tuned	Improvement
ROUGE-1	0.0041	0.7122	+0.7081
ROUGE-2	0.0030	0.6452	+0.6422
ROUGE-L	0.0041	0.7122	+0.7081
BLEU	0.0000	0.5026	+0.5026

Issue Semantic Similarity

Metric	Baseline	Fine-Tuned	Improvement
ROUGE-1	0.0018	0.4018	+0.4000
ROUGE-2	0.0000	0.3463	+0.3463
ROUGE-L	0.0018	0.4013	+0.3995
BLEU	0.0000	0.3368	+0.3368

Sub-Issue Semantic Similarity

Metric	Baseline	Fine-Tuned	Improvement
ROUGE-1	0.0004	0.5215	+0.5211
ROUGE-2	0.0000	0.4895	+0.4895
ROUGE-L	0.0004	0.5207	+0.5203
BLEU	0.0000	0.2283	+0.2283

Final Results Summary

Category	Base Qwen2.5-7B	Fine-Tuned CFPB Model
Product Classification (Exact Match)	1.0%	90.8%
Product F1 Score	1.96%	90.7%
Sub-Product ROUGE-L	0.004	0.712
Issue ROUGE-L	0.002	0.401
Sub-Issue ROUGE-L	0.000	0.521
Output Structure	Inconsistent	Reliable CFPB JSON
Taxonomy Alignment	Poor	High
Training Time	~45 Minutes	~45 Minutes
Inference Latency	Baseline	Near Identical
Additional GPU Memory	Baseline	~50 MB Adapter

Run inference

python
def categorise_complaint(complaint_text: str, model, tokenizer) -> dict:
    messages = [
        {
            "role": "system",
            "content": (
                "You are a banking complaint classification assistant. "
                "Given a consumer complaint narrative, extract the CFPB ticket fields "
                "as a JSON object with keys: product, sub_product, issue, sub_issue."
            ),
        },
        {
            "role": "user",
            "content": complaint_text,
        },
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=128,
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id,
        )

    prompt_len = inputs["input_ids"].shape[1]
    generated  = tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)
    return generated


complaint = """
I reported fraudulent transactions on my debit card and the bank reversed
my provisional credit without explaining the investigation outcome.
"""

result = categorise_complaint(complaint, model, tokenizer)
print(result)
# {"product": "Checking or savings account", "sub_product": "Checking account",
#  "issue": "Unauthorized transactions or other transaction problem",
#  "sub_issue": "Debit card issue"}

Dependencies

markdown
transformers==4.44.0
peft==0.12.0
accelerate==0.34.0
datasets==2.21.0
torch (ROCm-compatible build for AMD, or standard CUDA build)
scikit-learn
rouge-score
sacrebleu
nltk

Limitations

CFPB taxonomy only. The model is trained on and constrained to CFPB Consumer Complaint Database labels. It is not a general-purpose complaint classifier and should not be used with complaint taxonomies from other regulatory bodies or internal systems without retraining.
Issue field accuracy. The issue field (33.6% accuracy) is the weakest link. The CFPB issue taxonomy contains 80+ canonical strings with overlapping phrasing. Expanding training data and further tuning the constrained decoder are the most direct paths to improvement.
English language only. All training data is in English. Performance on non-English complaints is untested and likely poor.
Context length. Complaints longer than 1024 tokens will be truncated. Most CFPB complaints are well within this limit, but very long narratives may lose relevant context.

Intended Use

This model is intended for use by:

Banking operations teams automating first-touch complaint categorisation
Compliance teams processing regulatory complaint filings
Contact centre platforms routing incoming complaints before agent assignment
Research teams studying LLM adaptation for financial NLP tasks

It is not intended for consumer-facing deployment without human review of outputs, or for use in jurisdictions where automated complaint classification decisions have legal or regulatory implications without appropriate oversight.

Training Infrastructure

Trained on an AMD Instinct MI300X GPU (192 GB HBM3 VRAM) running ROCm 7.2.4. The training stack is fully ROCm-native — bitsandbytes (CUDA-only) is not used. Model precision is bfloat16, which is the native compute type for the CDNA3 architecture.

Citation

If you use this model in research or production, please cite the CFPB Consumer Complaint Database as the data source:

markdown
Consumer Financial Protection Bureau (CFPB)
Consumer Complaint Database
https://www.consumerfinance.gov/data-research/consumer-complaints/

arya-cfpb-qwen_2.5-7b-lora-V2

Get help setting up a custom Dedicated Endpoints.

README