naazimsnh02

FinanceGemma-E4B-lora

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Highlights

Base model: google/gemma-4-E4B-it (Gemma 4, ~4B params)
Method: 4-bit QLoRA SFT with response-only loss masking
Hardware: 1× NVIDIA L4 (24 GB VRAM) — instance on GCP
Final train loss: 0.0864

Training details

Table with columns: Parameter, Value
Parameter	Value
Base model	`unsloth/gemma-4-E4B-it` (4-bit quantized)
LoRA rank (r)	32
LoRA alpha	64
LoRA dropout	0
Target modules	attention + MLP (all language layers)
Max sequence length	4096
Effective batch size	16 (micro-batch 1 × grad accum 16)
Epochs	2
Total steps	3,738
Learning rate	1e-4
LR scheduler	cosine (3% warmup)
Optimizer	AdamW 8-bit
Precision	bf16
Weight decay	0.01
Training runtime	59,813s (~16.6 hours)
Peak VRAM	16.4 GB
Loss (start → end)	0.587 → 0.085

Training data

~30,000 instruction-response pairs built from two open-source finance datasets:

The corpus is diversity-sampled (per-source and per-task-type caps) and 10-gram decontaminated against FLARE evaluation inputs (FPB, FiQA_SA, Headline) to prevent benchmark leakage.

Evaluation

FLARE-style multiple-choice accuracy on AdaptLLM/finance-tasks (greedy decoding, temp=0):

Table with columns: Task, n, Accuracy
Task	n	Accuracy
FPB	970	78.4%
FiQA_SA	235	67.2%
Headline	20,547	69.2%
Macro avg	—	71.6%

Usage

With PEFT + Transformers

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-E4B-it",
    device_map="auto",
    torch_dtype="auto",
)
model = PeftModel.from_pretrained(base_model, "naazimsnh02/FinanceGemma-E4B-lora")
tokenizer = AutoTokenizer.from_pretrained("naazimsnh02/FinanceGemma-E4B-lora")

prompt = "Classify the sentiment of this financial news: 'Apple reported record Q4 earnings, beating analyst estimates by 15%.'"
messages = [{"role": "user", "content": [{"type": "text", "text": prompt}]}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True, return_dict=True).to(model.device)
output = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(output[0], skip_special_tokens=True))

With Unsloth (faster inference)

python
from unsloth import FastModel

model, tokenizer = FastModel.from_pretrained(
    "naazimsnh02/FinanceGemma-E4B-lora",
    max_seq_length=4096,
    load_in_4bit=True,
)
FastModel.for_inference(model)

Intended use

Financial text analysis tasks including:

Sentiment classification (positive / negative / neutral)
Headline interpretation (price up / down / neutral signals)
Financial QA and reasoning over financial documents

Limitations

Trained on English-language finance data only
Optimized for classification/short-answer tasks — not long-form financial report generation
4-bit QLoRA means some quality loss vs. full fine-tuning; suitable for the target budget constraints
No reinforcement learning applied (SFT only)

License

Apache 2.0 (same as the base Gemma 4 model)

Model provider

naazimsnh02

Model tree

Base

google/gemma-4-E4B-it

Adapter

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Highlights

Base model: google/gemma-4-E4B-it (Gemma 4, ~4B params)
Method: 4-bit QLoRA SFT with response-only loss masking
Hardware: 1× NVIDIA L4 (24 GB VRAM) — instance on GCP
Final train loss: 0.0864

Training details

Table with columns: Parameter, Value
Parameter	Value
Base model	`unsloth/gemma-4-E4B-it` (4-bit quantized)
LoRA rank (r)	32
LoRA alpha	64
LoRA dropout	0
Target modules	attention + MLP (all language layers)
Max sequence length	4096
Effective batch size	16 (micro-batch 1 × grad accum 16)
Epochs	2
Total steps	3,738
Learning rate	1e-4
LR scheduler	cosine (3% warmup)
Optimizer	AdamW 8-bit
Precision	bf16
Weight decay	0.01
Training runtime	59,813s (~16.6 hours)
Peak VRAM	16.4 GB
Loss (start → end)	0.587 → 0.085

Training data

~30,000 instruction-response pairs built from two open-source finance datasets:

The corpus is diversity-sampled (per-source and per-task-type caps) and 10-gram decontaminated against FLARE evaluation inputs (FPB, FiQA_SA, Headline) to prevent benchmark leakage.

Evaluation

FLARE-style multiple-choice accuracy on AdaptLLM/finance-tasks (greedy decoding, temp=0):

Table with columns: Task, n, Accuracy
Task	n	Accuracy
FPB	970	78.4%
FiQA_SA	235	67.2%
Headline	20,547	69.2%
Macro avg	—	71.6%

Usage

With PEFT + Transformers

python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-E4B-it",
    device_map="auto",
    torch_dtype="auto",
)
model = PeftModel.from_pretrained(base_model, "naazimsnh02/FinanceGemma-E4B-lora")
tokenizer = AutoTokenizer.from_pretrained("naazimsnh02/FinanceGemma-E4B-lora")

prompt = "Classify the sentiment of this financial news: 'Apple reported record Q4 earnings, beating analyst estimates by 15%.'"
messages = [{"role": "user", "content": [{"type": "text", "text": prompt}]}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True, return_dict=True).to(model.device)
output = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(output[0], skip_special_tokens=True))

With Unsloth (faster inference)

python
from unsloth import FastModel

model, tokenizer = FastModel.from_pretrained(
    "naazimsnh02/FinanceGemma-E4B-lora",
    max_seq_length=4096,
    load_in_4bit=True,
)
FastModel.for_inference(model)

Intended use

Financial text analysis tasks including:

Sentiment classification (positive / negative / neutral)
Headline interpretation (price up / down / neutral signals)
Financial QA and reasoning over financial documents

Limitations

Trained on English-language finance data only
Optimized for classification/short-answer tasks — not long-form financial report generation
4-bit QLoRA means some quality loss vs. full fine-tuning; suitable for the target budget constraints
No reinforcement learning applied (SFT only)

License

Apache 2.0 (same as the base Gemma 4 model)

FinanceGemma-E4B-lora

Get help setting up a custom Dedicated Endpoints.

README

Highlights

Training details

Training data

Evaluation

Usage

With PEFT + Transformers

With Unsloth (faster inference)

Intended use

Limitations

License

Explore FriendliAI today

README

Highlights

Training details

Training data

Evaluation

Usage

With PEFT + Transformers

With Unsloth (faster inference)

Intended use

Limitations

License