Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model description

Base modelQwen/Qwen2.5-1.5B-Instruct
Method4-bit QLoRA SFT → LoRA merged into full weights (bf16 safetensors)
Classesphishing, legitimate
Max sequence length512 tokens (training default)
Parameters~1.5B (full merged checkpoint)

Intended uses

  • Research and education on phishing URL / email text classification
  • Prototyping security tooling with explicit human review

Out-of-scope uses

  • Sole automated decision-making for blocking users or transactions without review
  • Spam campaigns, social engineering, or evading security systems
  • Languages or domains far from the training distribution

Inference format

Use this exact instruction and layout (training and eval depend on it):

text

### Instruction:
Classify the email or URL as phishing or legitimate.
### Input:
<your email body or URL here>
### Response:

The model should complete ### Response: with phishing or legitimate. Prefer temperature 0 / greedy decoding.

Evaluation (held-out test set)

Metrics on 1,000 held-out test examples (500 per class), evaluated with the training adapter + PEFT (evaluate script, left-padding batch inference). Report before merge; merged weights are expected to match closely.

MetricValue
Accuracy96.5%
Macro F10.9820

Confusion matrix (rows = true, columns = predicted):

phishinglegitimate
phishing4690
legitimate0496

markdown

precision recall f1-score support
phishing 1.00 0.94 0.97 500
legitimate 1.00 0.99 1.00 500
micro avg 1.00 0.96 0.98 1000
macro avg 1.00 0.96 0.98 1000
weighted avg 1.00 0.96 0.98 1000

Training data

  • Source: Synthetic dataset (5,000 rows, balanced 2,500 / 2,500)
  • Columns: model_input (email or URL text), label (phishing | legitimate)
  • Split: 80/20 stratified → 4,000 train / 1,000 test

📌 Dataset availability: The training dataset is not publicly released as part of this repository. If you are interested in the data for research collaboration or reproducibility purposes, please contact the authors directly via HuggingFace.

Training procedure

SettingValue
Base modelQwen/Qwen2.5-1.5B-Instruct
Quantization4-bit NF4 (QLoRA training only)
LoRA rank / alpha16 / 32
Epochs1
Learning rate2e-4
Batch (effective)16 (8 × grad accum 2)
Max seq length512
Seed42
Mergebf16 full weights for this checkpoint

Usage

Transformers

python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "Lovely2209/Qwen2.5-1.5B-Phishing-Email-Detector"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(
repo,
torch_dtype=torch.bfloat16,
device_map="auto",
)
text = "http://paypa1-secure.example/verify?id=abc"
prompt = (
"### Instruction:\n"
"Classify the email or URL as phishing or legitimate.\n\n"
f"### Input:\n{text}\n\n"
"### Response:\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=8,
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))

vLLM (merged — no LoRA)

bash

export VLLM_USE_FLASHINFER_SAMPLER=0 # if nvcc / CUDA toolkit is not installed

python

from vllm import LLM, SamplingParams
from vllm.sampling_params import StructuredOutputsParams
repo = "Lovely2209/Qwen2.5-1.5B-Phishing-Email-Detector"
llm = LLM(
model=repo,
dtype="bfloat16",
max_model_len=512,
enforce_eager=True,
)
prompt = (
"### Instruction:\n"
"Classify the email or URL as phishing or legitimate.\n\n"
"### Input:\n"
"http://paypa1-secure.example/verify\n\n"
"### Response:\n"
)
sampling = SamplingParams(
temperature=0.0,
max_tokens=8,
structured_outputs=StructuredOutputsParams(
choice=["phishing", "legitimate"]
),
)
outputs = llm.generate([prompt], sampling)
print(outputs[0].outputs[0].text.strip())

License

This model is released under the Apache License 2.0, consistent with Qwen/Qwen2.5-1.5B-Instruct. See LICENSE and NOTICE in this repository.

This project is not affiliated with, endorsed by, or sponsored by Alibaba Cloud or the Qwen Team.

Citation

If you use this model, please cite the base Qwen2.5 work:

bibtex

@misc{qwen2.5,
title = {Qwen2.5: A Party of Foundation Models},
url = {https://qwenlm.github.io/blog/qwen2.5/},
author = {Qwen Team},
month = {September},
year = {2024}
}

Fine-tune attribution:

bibtex

@misc{qwen25-phishing-email-detector-2026,
title = {Qwen2.5-1.5B Phishing Email Detector (merged QLoRA SFT)},
author = {Jagriti Singh and Lovely Kumari},
year = {2026},
howpublished = {\url{https://huggingface.co/Lovely2209/Qwen2.5-1.5B-Phishing-Email-Detector}},
note = {Fine-tuned from Qwen/Qwen2.5-1.5B-Instruct}
}

Model provider

Lovely2209

Model tree

Base

Qwen/Qwen2.5-1.5B-Instruct

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today