Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model

  • Base model: unsloth/Qwen2.5-7B-Instruct-bnb-4bit
  • Adapter type: LoRA
  • Fine-tuning method: QLoRA
  • Dataset: gbharti/finance-alpaca
  • Runtime: Kaggle Tesla T4
  • Max sequence length: 1024
  • Training steps: 300

Run Summary

MetricValue
Train rows68,546
Eval rows300
Max steps300
Train loss1.8094
First eval loss1.5393
Final eval loss1.5303
Eval loss delta-0.0089
First eval perplexity4.6612
Final eval perplexity4.6197

The evaluation loss and perplexity show a modest but measurable improvement after fine-tuning.

What This Project Demonstrates

This project demonstrates:

  • Loading a 7B instruction model under limited VRAM
  • 4-bit quantized model loading
  • LoRA adapter fine-tuning
  • Gradient accumulation for effective batch sizing
  • Completion-only supervised fine-tuning
  • Base model vs fine-tuned model comparison
  • Exporting reusable adapter weights
  • Producing reproducible training metrics and reports

Engineering Note

The notebook uses a single controlled model loader so the primary and fallback models cannot both be loaded into the same runtime by accident.

This was added after an earlier experiment showed that loading both a primary model and a fallback model into the same Kaggle runtime can waste VRAM and make debugging harder.

Files

Adapter weights and tokenizer files are stored in the repository root.

Additional reports are available under reports/:

  • training_metrics.csv
  • base_vs_finetuned_comparison.csv
  • loss_curve.png
  • README_report.md
  • run_summary.json

Example Usage

python

from unsloth import FastLanguageModel
from peft import PeftModel
import torch
base_model = "unsloth/Qwen2.5-7B-Instruct-bnb-4bit"
adapter_repo = "emirhuseyin/finllm-qlora-qwen-7b-finance-adapter"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=base_model,
max_seq_length=1024,
dtype=None,
load_in_4bit=True,
)
model = PeftModel.from_pretrained(model, adapter_repo)
FastLanguageModel.for_inference(model)
prompt = """FinLLM is a finance-focused language model for risk and sentiment analysis. It explains uncertainty, trade-offs, and assumptions clearly. It does not provide personalized investment advice without sufficient context.
### Instruction:
Analyze the financial risk and sentiment of the following market news.
### Input:
A technology company reported stronger-than-expected revenue growth, but management warned that margins may contract next quarter due to rising AI infrastructure costs.
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

This is a constrained QLoRA domain-alignment demo, not a production financial advisor.

It should not be used for personalized investment advice. A production workflow would require broader benchmarks, hallucination testing, safety evaluation, financial-domain validation, retrieval grounding, and human review.

Intended Use

This adapter is intended for research, portfolio demonstration, and experimentation with memory-efficient financial instruction tuning.

It is not intended for trading automation, regulated financial advice, or high-stakes financial decision-making.

Model provider

emirhuseyin

Model tree

Base

unsloth/Qwen2.5-7B-Instruct-bnb-4bit

Adapter

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today