Highlights
- Base model:
google/gemma-4-E4B-it (Gemma 4, ~4B params)
- Method: 4-bit QLoRA SFT with response-only loss masking
- Hardware: 1× NVIDIA L4 (24 GB VRAM) — instance on GCP
- Final train loss: 0.0864
Training details
Table with columns: Parameter, Value| Parameter | Value |
|---|
| Base model | unsloth/gemma-4-E4B-it (4-bit quantized) |
| LoRA rank (r) | 32 |
| LoRA alpha | 64 |
| LoRA dropout | 0 |
| Target modules | attention + MLP (all language layers) |
| Max sequence length | 4096 |
| Effective batch size | 16 (micro-batch 1 × grad accum 16) |
| Epochs | 2 |
| Total steps | 3,738 |
| Learning rate | 1e-4 |
| LR scheduler | cosine (3% warmup) |
| Optimizer | AdamW 8-bit |
| Precision | bf16 |
| Weight decay | 0.01 |
| Training runtime | 59,813s (~16.6 hours) |
| Peak VRAM | 16.4 GB |
| Loss (start → end) | 0.587 → 0.085 |
Training data
~30,000 instruction-response pairs built from two open-source finance datasets:
The corpus is diversity-sampled (per-source and per-task-type caps) and 10-gram decontaminated against FLARE evaluation inputs (FPB, FiQA_SA, Headline) to prevent benchmark leakage.
Evaluation
FLARE-style multiple-choice accuracy on AdaptLLM/finance-tasks (greedy decoding, temp=0):
Table with columns: Task, n, Accuracy| Task | n | Accuracy |
|---|
| FPB | 970 | 78.4% |
| FiQA_SA | 235 | 67.2% |
| Headline | 20,547 | 69.2% |
| Macro avg | — | 71.6% |
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"google/gemma-4-E4B-it",
device_map="auto",
torch_dtype="auto",
)
model = PeftModel.from_pretrained(base_model, "naazimsnh02/FinanceGemma-E4B-lora")
tokenizer = AutoTokenizer.from_pretrained("naazimsnh02/FinanceGemma-E4B-lora")
prompt = "Classify the sentiment of this financial news: 'Apple reported record Q4 earnings, beating analyst estimates by 15%.'"
messages = [{"role": "user", "content": [{"type": "text", "text": prompt}]}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True, return_dict=True).to(model.device)
output = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(output[0], skip_special_tokens=True))
With Unsloth (faster inference)
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
"naazimsnh02/FinanceGemma-E4B-lora",
max_seq_length=4096,
load_in_4bit=True,
)
FastModel.for_inference(model)
Intended use
Financial text analysis tasks including:
- Sentiment classification (positive / negative / neutral)
- Headline interpretation (price up / down / neutral signals)
- Financial QA and reasoning over financial documents
Limitations
- Trained on English-language finance data only
- Optimized for classification/short-answer tasks — not long-form financial report generation
- 4-bit QLoRA means some quality loss vs. full fine-tuning; suitable for the target budget constraints
- No reinforcement learning applied (SFT only)
License
Apache 2.0 (same as the base Gemma 4 model)