Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model description
| Base model | Qwen/Qwen2.5-1.5B-Instruct |
| Method | 4-bit QLoRA SFT → LoRA merged into full weights (bf16 safetensors) |
| Classes | phishing, legitimate |
| Max sequence length | 512 tokens (training default) |
| Parameters | ~1.5B (full merged checkpoint) |
Intended uses
- Research and education on phishing URL / email text classification
- Prototyping security tooling with explicit human review
Out-of-scope uses
- Sole automated decision-making for blocking users or transactions without review
- Spam campaigns, social engineering, or evading security systems
- Languages or domains far from the training distribution
Inference format
Use this exact instruction and layout (training and eval depend on it):
text
### Instruction:Classify the email or URL as phishing or legitimate.### Input:<your email body or URL here>### Response:
The model should complete ### Response: with phishing or legitimate. Prefer temperature 0 / greedy decoding.
Evaluation (held-out test set)
Metrics on 1,000 held-out test examples (500 per class), evaluated with the training adapter + PEFT (evaluate script, left-padding batch inference). Report before merge; merged weights are expected to match closely.
| Metric | Value |
|---|---|
| Accuracy | 96.5% |
| Macro F1 | 0.9820 |
Confusion matrix (rows = true, columns = predicted):
| phishing | legitimate | |
|---|---|---|
| phishing | 469 | 0 |
| legitimate | 0 | 496 |
markdown
precision recall f1-score supportphishing 1.00 0.94 0.97 500legitimate 1.00 0.99 1.00 500micro avg 1.00 0.96 0.98 1000macro avg 1.00 0.96 0.98 1000weighted avg 1.00 0.96 0.98 1000
Training data
- Source: Synthetic dataset (5,000 rows, balanced 2,500 / 2,500)
- Columns:
model_input(email or URL text),label(phishing|legitimate) - Split: 80/20 stratified → 4,000 train / 1,000 test
📌 Dataset availability: The training dataset is not publicly released as part of this repository. If you are interested in the data for research collaboration or reproducibility purposes, please contact the authors directly via HuggingFace.
Training procedure
| Setting | Value |
|---|---|
| Base model | Qwen/Qwen2.5-1.5B-Instruct |
| Quantization | 4-bit NF4 (QLoRA training only) |
| LoRA rank / alpha | 16 / 32 |
| Epochs | 1 |
| Learning rate | 2e-4 |
| Batch (effective) | 16 (8 × grad accum 2) |
| Max seq length | 512 |
| Seed | 42 |
| Merge | bf16 full weights for this checkpoint |
Usage
Transformers
python
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizerrepo = "Lovely2209/Qwen2.5-1.5B-Phishing-Email-Detector"tokenizer = AutoTokenizer.from_pretrained(repo)model = AutoModelForCausalLM.from_pretrained(repo,torch_dtype=torch.bfloat16,device_map="auto",)text = "http://paypa1-secure.example/verify?id=abc"prompt = ("### Instruction:\n""Classify the email or URL as phishing or legitimate.\n\n"f"### Input:\n{text}\n\n""### Response:\n")inputs = tokenizer(prompt, return_tensors="pt").to(model.device)with torch.no_grad():out = model.generate(**inputs,max_new_tokens=8,do_sample=False,pad_token_id=tokenizer.eos_token_id,)print(tokenizer.decode(out[0], skip_special_tokens=True))
vLLM (merged — no LoRA)
bash
export VLLM_USE_FLASHINFER_SAMPLER=0 # if nvcc / CUDA toolkit is not installed
python
from vllm import LLM, SamplingParamsfrom vllm.sampling_params import StructuredOutputsParamsrepo = "Lovely2209/Qwen2.5-1.5B-Phishing-Email-Detector"llm = LLM(model=repo,dtype="bfloat16",max_model_len=512,enforce_eager=True,)prompt = ("### Instruction:\n""Classify the email or URL as phishing or legitimate.\n\n""### Input:\n""http://paypa1-secure.example/verify\n\n""### Response:\n")sampling = SamplingParams(temperature=0.0,max_tokens=8,structured_outputs=StructuredOutputsParams(choice=["phishing", "legitimate"]),)outputs = llm.generate([prompt], sampling)print(outputs[0].outputs[0].text.strip())
License
This model is released under the Apache License 2.0, consistent with Qwen/Qwen2.5-1.5B-Instruct. See LICENSE and NOTICE in this repository.
This project is not affiliated with, endorsed by, or sponsored by Alibaba Cloud or the Qwen Team.
Citation
If you use this model, please cite the base Qwen2.5 work:
bibtex
@misc{qwen2.5,title = {Qwen2.5: A Party of Foundation Models},url = {https://qwenlm.github.io/blog/qwen2.5/},author = {Qwen Team},month = {September},year = {2024}}
Fine-tune attribution:
bibtex
@misc{qwen25-phishing-email-detector-2026,title = {Qwen2.5-1.5B Phishing Email Detector (merged QLoRA SFT)},author = {Jagriti Singh and Lovely Kumari},year = {2026},howpublished = {\url{https://huggingface.co/Lovely2209/Qwen2.5-1.5B-Phishing-Email-Detector}},note = {Fine-tuned from Qwen/Qwen2.5-1.5B-Instruct}}
Model provider
Lovely2209
Model tree
Base
Qwen/Qwen2.5-1.5B-Instruct
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information