Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model
- Base model:
unsloth/Qwen2.5-7B-Instruct-bnb-4bit - Adapter type: LoRA
- Fine-tuning method: QLoRA
- Dataset:
gbharti/finance-alpaca - Runtime: Kaggle Tesla T4
- Max sequence length: 1024
- Training steps: 300
Run Summary
| Metric | Value |
|---|---|
| Train rows | 68,546 |
| Eval rows | 300 |
| Max steps | 300 |
| Train loss | 1.8094 |
| First eval loss | 1.5393 |
| Final eval loss | 1.5303 |
| Eval loss delta | -0.0089 |
| First eval perplexity | 4.6612 |
| Final eval perplexity | 4.6197 |
The evaluation loss and perplexity show a modest but measurable improvement after fine-tuning.
What This Project Demonstrates
This project demonstrates:
- Loading a 7B instruction model under limited VRAM
- 4-bit quantized model loading
- LoRA adapter fine-tuning
- Gradient accumulation for effective batch sizing
- Completion-only supervised fine-tuning
- Base model vs fine-tuned model comparison
- Exporting reusable adapter weights
- Producing reproducible training metrics and reports
Engineering Note
The notebook uses a single controlled model loader so the primary and fallback models cannot both be loaded into the same runtime by accident.
This was added after an earlier experiment showed that loading both a primary model and a fallback model into the same Kaggle runtime can waste VRAM and make debugging harder.
Files
Adapter weights and tokenizer files are stored in the repository root.
Additional reports are available under reports/:
training_metrics.csvbase_vs_finetuned_comparison.csvloss_curve.pngREADME_report.mdrun_summary.json
Example Usage
python
from unsloth import FastLanguageModelfrom peft import PeftModelimport torchbase_model = "unsloth/Qwen2.5-7B-Instruct-bnb-4bit"adapter_repo = "emirhuseyin/finllm-qlora-qwen-7b-finance-adapter"model, tokenizer = FastLanguageModel.from_pretrained(model_name=base_model,max_seq_length=1024,dtype=None,load_in_4bit=True,)model = PeftModel.from_pretrained(model, adapter_repo)FastLanguageModel.for_inference(model)prompt = """FinLLM is a finance-focused language model for risk and sentiment analysis. It explains uncertainty, trade-offs, and assumptions clearly. It does not provide personalized investment advice without sufficient context.### Instruction:Analyze the financial risk and sentiment of the following market news.### Input:A technology company reported stronger-than-expected revenue growth, but management warned that margins may contract next quarter due to rising AI infrastructure costs.### Response:"""inputs = tokenizer(prompt, return_tensors="pt").to("cuda")with torch.no_grad():outputs = model.generate(**inputs,max_new_tokens=200,temperature=0.7,top_p=0.9,do_sample=True,)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Limitations
This is a constrained QLoRA domain-alignment demo, not a production financial advisor.
It should not be used for personalized investment advice. A production workflow would require broader benchmarks, hallucination testing, safety evaluation, financial-domain validation, retrieval grounding, and human review.
Intended Use
This adapter is intended for research, portfolio demonstration, and experimentation with memory-efficient financial instruction tuning.
It is not intended for trading automation, regulated financial advice, or high-stakes financial decision-making.
Model provider
emirhuseyin
Model tree
Base
unsloth/Qwen2.5-7B-Instruct-bnb-4bit
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information