emirhuseyin/finllm-qlora-qwen-7b-finance-adapter API & Inference Endpoint

Model

Base model: unsloth/Qwen2.5-7B-Instruct-bnb-4bit
Adapter type: LoRA
Fine-tuning method: QLoRA
Dataset: gbharti/finance-alpaca
Runtime: Kaggle Tesla T4
Max sequence length: 1024
Training steps: 300

Run Summary

Metric	Value
Train rows	68,546
Eval rows	300
Max steps	300
Train loss	1.8094
First eval loss	1.5393
Final eval loss	1.5303
Eval loss delta	-0.0089
First eval perplexity	4.6612
Final eval perplexity	4.6197

The evaluation loss and perplexity show a modest but measurable improvement after fine-tuning.

What This Project Demonstrates

This project demonstrates:

Loading a 7B instruction model under limited VRAM
4-bit quantized model loading
LoRA adapter fine-tuning
Gradient accumulation for effective batch sizing
Completion-only supervised fine-tuning
Base model vs fine-tuned model comparison
Exporting reusable adapter weights
Producing reproducible training metrics and reports

Engineering Note

The notebook uses a single controlled model loader so the primary and fallback models cannot both be loaded into the same runtime by accident.

This was added after an earlier experiment showed that loading both a primary model and a fallback model into the same Kaggle runtime can waste VRAM and make debugging harder.

Files

Adapter weights and tokenizer files are stored in the repository root.

Additional reports are available under reports/:

training_metrics.csv
base_vs_finetuned_comparison.csv
loss_curve.png
README_report.md
run_summary.json

Example Usage

python
from unsloth import FastLanguageModel
from peft import PeftModel
import torch

base_model = "unsloth/Qwen2.5-7B-Instruct-bnb-4bit"
adapter_repo = "emirhuseyin/finllm-qlora-qwen-7b-finance-adapter"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=base_model,
    max_seq_length=1024,
    dtype=None,
    load_in_4bit=True,
)

model = PeftModel.from_pretrained(model, adapter_repo)
FastLanguageModel.for_inference(model)

prompt = """FinLLM is a finance-focused language model for risk and sentiment analysis. It explains uncertainty, trade-offs, and assumptions clearly. It does not provide personalized investment advice without sufficient context.

### Instruction:
Analyze the financial risk and sentiment of the following market news.

### Input:
A technology company reported stronger-than-expected revenue growth, but management warned that margins may contract next quarter due to rising AI infrastructure costs.

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=200,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

This is a constrained QLoRA domain-alignment demo, not a production financial advisor.

It should not be used for personalized investment advice. A production workflow would require broader benchmarks, hallucination testing, safety evaluation, financial-domain validation, retrieval grounding, and human review.

Intended Use

This adapter is intended for research, portfolio demonstration, and experimentation with memory-efficient financial instruction tuning.

It is not intended for trading automation, regulated financial advice, or high-stakes financial decision-making.

finllm-qlora-qwen-7b-finance-adapter

Get help setting up a custom Dedicated Endpoints.

README