Lovely2209

Qwen2.5-1.5B-Phishing-Email-Detector

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Model description

Table

Base model	Qwen/Qwen2.5-1.5B-Instruct
Method	4-bit QLoRA SFT → LoRA merged into full weights (bf16 `safetensors`)
Classes	`phishing`, `legitimate`
Max sequence length	512 tokens (training default)
Parameters	~1.5B (full merged checkpoint)

Intended uses

Research and education on phishing URL / email text classification
Prototyping security tooling with explicit human review

Out-of-scope uses

Sole automated decision-making for blocking users or transactions without review
Spam campaigns, social engineering, or evading security systems
Languages or domains far from the training distribution

Inference format

Use this exact instruction and layout (training and eval depend on it):

text
### Instruction:
Classify the email or URL as phishing or legitimate.

### Input:
<your email body or URL here>

### Response:

The model should complete ### Response: with phishing or legitimate. Prefer temperature 0 / greedy decoding.

Evaluation (held-out test set)

Metrics on 1,000 held-out test examples (500 per class), evaluated with the training adapter + PEFT (evaluate script, left-padding batch inference). Report before merge; merged weights are expected to match closely.

Table with columns: Metric, Value
Metric	Value
Accuracy	96.5%
Macro F1	0.9820

Confusion matrix (rows = true, columns = predicted):

Table with columns: phishing, legitimate
	phishing	legitimate
phishing	469	0
legitimate	0	496

markdown
precision    recall  f1-score   support

    phishing       1.00      0.94      0.97       500
  legitimate       1.00      0.99      1.00       500

   micro avg       1.00      0.96      0.98      1000
   macro avg       1.00      0.96      0.98      1000
weighted avg       1.00      0.96      0.98      1000

Training data

Source: Synthetic dataset (5,000 rows, balanced 2,500 / 2,500)
Columns: model_input (email or URL text), label (phishing | legitimate)
Split: 80/20 stratified → 4,000 train / 1,000 test

📌 Dataset availability: The training dataset is not publicly released as part of this repository. If you are interested in the data for research collaboration or reproducibility purposes, please contact the authors directly via HuggingFace.

Training procedure

Table with columns: Setting, Value
Setting	Value
Base model	Qwen/Qwen2.5-1.5B-Instruct
Quantization	4-bit NF4 (QLoRA training only)
LoRA rank / alpha	16 / 32
Epochs	1
Learning rate	2e-4
Batch (effective)	16 (8 × grad accum 2)
Max seq length	512
Seed	42
Merge	bf16 full weights for this checkpoint

Usage

Transformers

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "Lovely2209/Qwen2.5-1.5B-Phishing-Email-Detector"

tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(
    repo,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

text = "http://paypa1-secure.example/verify?id=abc"
prompt = (
    "### Instruction:\n"
    "Classify the email or URL as phishing or legitimate.\n\n"
    f"### Input:\n{text}\n\n"
    "### Response:\n"
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=8,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id,
    )
print(tokenizer.decode(out[0], skip_special_tokens=True))

vLLM (merged — no LoRA)

bash
export VLLM_USE_FLASHINFER_SAMPLER=0   # if nvcc / CUDA toolkit is not installed

python
from vllm import LLM, SamplingParams
from vllm.sampling_params import StructuredOutputsParams

repo = "Lovely2209/Qwen2.5-1.5B-Phishing-Email-Detector"

llm = LLM(
    model=repo,
    dtype="bfloat16",
    max_model_len=512,
    enforce_eager=True,
)

prompt = (
    "### Instruction:\n"
    "Classify the email or URL as phishing or legitimate.\n\n"
    "### Input:\n"
    "http://paypa1-secure.example/verify\n\n"
    "### Response:\n"
)

sampling = SamplingParams(
    temperature=0.0,
    max_tokens=8,
    structured_outputs=StructuredOutputsParams(
        choice=["phishing", "legitimate"]
    ),
)

outputs = llm.generate([prompt], sampling)
print(outputs[0].outputs[0].text.strip())

License

This model is released under the Apache License 2.0, consistent with Qwen/Qwen2.5-1.5B-Instruct. See LICENSE and NOTICE in this repository.

This project is not affiliated with, endorsed by, or sponsored by Alibaba Cloud or the Qwen Team.

Citation

If you use this model, please cite the base Qwen2.5 work:

bibtex
@misc{qwen2.5,
  title = {Qwen2.5: A Party of Foundation Models},
  url = {https://qwenlm.github.io/blog/qwen2.5/},
  author = {Qwen Team},
  month = {September},
  year = {2024}
}

Fine-tune attribution:

bibtex
@misc{qwen25-phishing-email-detector-2026,
  title = {Qwen2.5-1.5B Phishing Email Detector (merged QLoRA SFT)},
  author = {Jagriti Singh and Lovely Kumari},
  year = {2026},
  howpublished = {\url{https://huggingface.co/Lovely2209/Qwen2.5-1.5B-Phishing-Email-Detector}},
  note = {Fine-tuned from Qwen/Qwen2.5-1.5B-Instruct}
}

Model provider

Lovely2209

Model tree

Base

Qwen/Qwen2.5-1.5B-Instruct

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Model description

Table

Base model	Qwen/Qwen2.5-1.5B-Instruct
Method	4-bit QLoRA SFT → LoRA merged into full weights (bf16 `safetensors`)
Classes	`phishing`, `legitimate`
Max sequence length	512 tokens (training default)
Parameters	~1.5B (full merged checkpoint)

Intended uses

Research and education on phishing URL / email text classification
Prototyping security tooling with explicit human review

Out-of-scope uses

Sole automated decision-making for blocking users or transactions without review
Spam campaigns, social engineering, or evading security systems
Languages or domains far from the training distribution

Inference format

Use this exact instruction and layout (training and eval depend on it):

text
### Instruction:
Classify the email or URL as phishing or legitimate.

### Input:
<your email body or URL here>

### Response:

The model should complete ### Response: with phishing or legitimate. Prefer temperature 0 / greedy decoding.

Evaluation (held-out test set)

Table with columns: Metric, Value
Metric	Value
Accuracy	96.5%
Macro F1	0.9820

Confusion matrix (rows = true, columns = predicted):

Table with columns: phishing, legitimate
	phishing	legitimate
phishing	469	0
legitimate	0	496

markdown
precision    recall  f1-score   support

    phishing       1.00      0.94      0.97       500
  legitimate       1.00      0.99      1.00       500

   micro avg       1.00      0.96      0.98      1000
   macro avg       1.00      0.96      0.98      1000
weighted avg       1.00      0.96      0.98      1000

Training data

Source: Synthetic dataset (5,000 rows, balanced 2,500 / 2,500)
Columns: model_input (email or URL text), label (phishing | legitimate)
Split: 80/20 stratified → 4,000 train / 1,000 test

📌 Dataset availability: The training dataset is not publicly released as part of this repository. If you are interested in the data for research collaboration or reproducibility purposes, please contact the authors directly via HuggingFace.

Training procedure

Table with columns: Setting, Value
Setting	Value
Base model	Qwen/Qwen2.5-1.5B-Instruct
Quantization	4-bit NF4 (QLoRA training only)
LoRA rank / alpha	16 / 32
Epochs	1
Learning rate	2e-4
Batch (effective)	16 (8 × grad accum 2)
Max seq length	512
Seed	42
Merge	bf16 full weights for this checkpoint

Usage

Transformers

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "Lovely2209/Qwen2.5-1.5B-Phishing-Email-Detector"

tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(
    repo,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

text = "http://paypa1-secure.example/verify?id=abc"
prompt = (
    "### Instruction:\n"
    "Classify the email or URL as phishing or legitimate.\n\n"
    f"### Input:\n{text}\n\n"
    "### Response:\n"
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=8,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id,
    )
print(tokenizer.decode(out[0], skip_special_tokens=True))

vLLM (merged — no LoRA)

bash
export VLLM_USE_FLASHINFER_SAMPLER=0   # if nvcc / CUDA toolkit is not installed

python
from vllm import LLM, SamplingParams
from vllm.sampling_params import StructuredOutputsParams

repo = "Lovely2209/Qwen2.5-1.5B-Phishing-Email-Detector"

llm = LLM(
    model=repo,
    dtype="bfloat16",
    max_model_len=512,
    enforce_eager=True,
)

prompt = (
    "### Instruction:\n"
    "Classify the email or URL as phishing or legitimate.\n\n"
    "### Input:\n"
    "http://paypa1-secure.example/verify\n\n"
    "### Response:\n"
)

sampling = SamplingParams(
    temperature=0.0,
    max_tokens=8,
    structured_outputs=StructuredOutputsParams(
        choice=["phishing", "legitimate"]
    ),
)

outputs = llm.generate([prompt], sampling)
print(outputs[0].outputs[0].text.strip())

License

This model is released under the Apache License 2.0, consistent with Qwen/Qwen2.5-1.5B-Instruct. See LICENSE and NOTICE in this repository.

This project is not affiliated with, endorsed by, or sponsored by Alibaba Cloud or the Qwen Team.

Citation

If you use this model, please cite the base Qwen2.5 work:

bibtex
@misc{qwen2.5,
  title = {Qwen2.5: A Party of Foundation Models},
  url = {https://qwenlm.github.io/blog/qwen2.5/},
  author = {Qwen Team},
  month = {September},
  year = {2024}
}

Fine-tune attribution:

bibtex
@misc{qwen25-phishing-email-detector-2026,
  title = {Qwen2.5-1.5B Phishing Email Detector (merged QLoRA SFT)},
  author = {Jagriti Singh and Lovely Kumari},
  year = {2026},
  howpublished = {\url{https://huggingface.co/Lovely2209/Qwen2.5-1.5B-Phishing-Email-Detector}},
  note = {Fine-tuned from Qwen/Qwen2.5-1.5B-Instruct}
}

Qwen2.5-1.5B-Phishing-Email-Detector

Get help setting up a custom Dedicated Endpoints.

README

Model description

Intended uses

Out-of-scope uses

Inference format

Evaluation (held-out test set)

Training data

Training procedure

Usage

Transformers

vLLM (merged — no LoRA)

License

Citation

Explore FriendliAI today

README

Model description

Intended uses

Out-of-scope uses

Inference format

Evaluation (held-out test set)

Training data

Training procedure

Usage

Transformers

vLLM (merged — no LoRA)

License

Citation